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(54) Object region data describing method and object region data creating apparatus 



(57) An object region data describing method of de- 
scribing information about the object region in an image 
over a plurality of frames determines (S1 , S2) the region 
of a target object in an image using an approximate func- 
tion that approximates the trajectory obtained by arrang- 
ing (S3, S4), in the direction of frame advance, one rep- 
resentative point of an approximate figure for the object 
region and the difference values for determining the oth- 
er representative points and describes information 
about the object region using a parameter for the func- 
tion. 
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Description 

[0001] The present invention relates to an object re- 
gion data describing method of describing information 
about the object region in a video and an object region 
data creating apparatus. 

[0002] Hyper media are configured such that related 
information called a hyper link is given in between me- 
diums, such as videos, sounds ortexts, to permit mutual 
reference. When videos are mainly used, related infor- 
mation has been provided for each object which ap- 
pears in the video. When the object is specified, related 
information (text information or the like) is displayed. 
The foregoing structure is a representative example of 
the hyper media. The object in the video is expressed 
by a frame number or a time stamp of the video, and 
information for identifying a region in the video which 
are recorded in video data or recorded as individual da- 
ta. 

[0003] Mask images have frequently been used as 
means for identifying a region in a video. The mask im- 
age is a bit map image constituted by giving different 
pixel values between the inside portion of an identified 
region and the outside portion of the same. A simplest 
method has an arrangement that a pixel value of "1" is 
given to the inside portion of the region and "0" is given 
to the outside portion of the same. Alternatively, a values 
which are employed in computer graphics are some- 
times employed. Since the a value is usually able to ex- 
press 256 levels of gray, a portion of the levels is used. 
The inside portion of the specified region is expressed 
as 255, while the outside portion of the same is ex- 
pressed as 0. The latter image is called an a map. When 
the regions in the image are expressed by the mask im- 
ages, determination whether or not a pixel in a frame is 
included In the specified region can easily be made by 
reading the value of the pixel of the mask image and by 
determining whether the value is 0 or 255. The mask 
image has freedom with which a region can be ex- 
pressed regardless of the shape of the region and even 
a discontinuous region can be expressed. The mask im- 
age must have pixels, the size of which is the same as 
the size of the original image. Thus/there arises a prob- 
lem in that the quantity of data cannot be reduced. 
[0004] To reduce the quantity of data of the mask im- 
age, the mask image is frequently compressed. When 
the mask image is a binary mask image constituted by 
0 and 1 , a process of a binary image can be performed. 
Therefore, the compression method employed in fac- 
simile machines or the like is frequently employed. In 
the case of MPEG-4 in which ISO/IEC MPEG (Moving 
Picture Experts Group) has been standardized, an arbi- 
trary shape coding method is employed in which the 
mask image constituted by 0 and 1 and the mask image 
using the a value are compressed. The foregoing com- 
pression method is a method using motion compensa- 
tion and capable of Improving compression efficiency. 
On the other hand, complex compression and decoding 



processes are required. 

[0005] To express a region in a video, the mask image 
or the compressed mask image has usually been em- 
ployed. However, data for identifying a region is required 

s to permit easy and quick extraction, to be reduced in 
quantity and to permit easy handling. Stated another 
way, the mask image is not suitable for identifying the 
object region in the video since it has a large quantity of 
data. The compressed mask image has a drawback in 

10 that coding/decodig is complicated and directly access- 
ing to the pixel of a predetermined frame cannot be per- 
formed causing handling to become difficult. 
[0006] Furthermore, only the positional information 
about the object region is represented and information 

15 about depth is not given. It is impossible to represent a 
state where the object disappears temporarily behind a 
thing. When shooting is done while the camera is fol- 
lowing the moving object, the actual motion of the object 
is not represented. Thus, it is difficult to make a search, 

20 taking into account information about depth, disappear- 
ance behind anotherthing occlusion, and the movement 
of the camera. Therefore, in searching, all the things in- 
cluding unrelated ones must be processed. 
[0007] Accordingly, the present invention is directed 

25 to method and apparatus that substantially obviates one 
or more of the problems due to limitations and disad- 
vantages of the related art. 

[0008] In accordance with the purpose of the inven- 
tion, as embodied and broadly described, the invention 
30 is directed to an d^MW^^^^S^^^ 
anffia^Sbje^ 

5a5eiittoljS^ 

35 [0009] Another object of the invention is to provide an 
object region data describing method and an object re- 
gion data creating apparatus which enable an object in 
an image to be searched for efficiently and effectively. 
[0010] According to embodiments of the present in- 

*o vention, there is provided a method of describing object 
region data about an object in video data over a plurality 
of frames, the method comprising: 

approximating the object using a figure for each of 

45 the frames; 

extracting a plurality of points representing the fig- 
ure for each of the frames; 
approximating trajectories with functions, the trajec- 
tories being obtained by arranging, in the frames 

so advancing direction, position data about one of the 
plurality of points and relative position data about 
remaining points with reference to the one of the 
plurality of points; and 

describing the object region data using the func- 
55 tions. 

[0011] According to embodiments of the present in- 
vention, there is provided another method of describing 
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object region data about an object in video data over a 
plurality of frames, the method comprising: 

approximating the object using a figure for each of 
the frames; 

extracting a plurality of points representing the fig- 
ure for each of the frames; 
approximating trajectories with functions, the trajec- 
tories being obtained by arranging, in the frames 
advancing direction, position data about the plural- 
ity of points in a reference frame and relative posi- 
tion data about the plurality of points in a succeed* 
ing frame with reference to the position data ab'ouf 
the plurality of points in the reference frame; and,, 
describing the object region data using the func- 
tions. 



extracting a plurality of points representing the fig- 
ure for each of the frames; 
approximating trajectories with functions, the trajec- 
tories being obtained by arranging, in the frames 
5 advancing direction, data indicating positions of the 
plurality of points; and 

describing the object region data using the func- 
tions and object passing range information indicat- 
ing a range where the figure approximating the ob- 
10 ject exist over the plurality of frames. 

[0015] According to embodiments of the present in- 
vention, there is provided a still further method of de- 
scribing object region data about an object moving in a 
15 panorama image formed by combining a plurality of 
frames with being overlapped, the method comprising: 



[0012] According to embodiments of the present in- 
vention, there is provided a further method of describing 
object region data about an object in video data over a 
plurality of frames, the method comprising: 

approximating the object using a figure for each of 
the frames; 

extracting a plurality of points representing the fig- 
ure for each of the frames; 
approximating trajectories with functions, the trajec- 
tories being obtained by arranging, in the frames 
advancing direction, data indicating positions of the 
plurality of points; and " ' : 
describing the object region data usi nig the func- 
tions and depth information of the object. r 

• • • * * „ • 

[0013] According to embodiments of the present in- 
vention, there is provided a still another method of de- 
scribing object region data about an object in video data 
over a plurality of frames, the method comprising: 

approximating the object using a figure.for each of 
the frames;^ 

extracting a plurality of points representing -thS fig- 

uretforaachr of the frames;-** . c r 

.... .1,^. 

approximating trajectories with functions , the trajec- 
tories being obtained by arranging, in the frames 
advancing direction; ciata indicating positions of the 
plurality of points; and 

describing the object region data using the func- 
tions.and display flag information indicating a" range 
of frames .in,which the. object or each of the points 



20 



25 



30 



35 



if 



approximating the object in the panorama image 
using a figure; 

extracting a plurality of points representing the fig- 
ure in a coordinate system of the panorama image; 
approximating trajectories with functions, the trajec- 
tories being obtained by arranging, in the frames 
advancing direction, data indicating positions of the 
plurality of points; and 

describing the object region data using the func- 
tions. 

[0016] This summary of the invention does not nec- 
essarily describe all necessary features so that the in- 
vention may also be a sub-combination of these de- 
scribed features. 
[001 7] ^ej$esent?inventro 



sr. Further the present invention can be imple- 
mented in a combination of hardware and software. The 
present invention can also be implemented by a single 
processing apparatus or a distributed network of 
processing apparatuses,^ 



40 



45 



is visiSle 1 or'hotf 



50 



[0014] According to embodiments of the present in- 
vention, there is provided a still further method of de- 
scribing object region data about an object in video data 
over a plurality of frames, the method comprising: 

approximating the object using a figure for each of 
the frames; 



55 



[0018] Since^he preseftt invention can be implement-^; 
^dfby^iof^areT the 

computer code provided to a general purpose computer 
on any suitable carrier medium. The carrier medium can 
comprise any storage medium such as a floppy disk, a 
CD ROM, a magnetic device or a programmable mem- 
ory device, or any transient medium such as any signal 
e.g. an electrical, optical or microwave signal. 
[0019] The invention can be more fully understood 
from the following detailed description when taken in 
conjunction with the accompanying drawings, in which: 

FIG 1 . shows an object region data creating appa- 
ratus according to a first embodiment of the present 
invention; 

FIG. 2 is a flowchart for processing in the object re- 
gion data creating apparatus according to the first 
embodiment; 

FIGS. 3A, 3B, and 3C are diagrams to help sche- 
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matically explain the process of describing the ob- 
ject region in an image using object region data; 
FIG. 4 is a diagram to help explain an example of 
finding a function that approximates the value of the 
X-coordinate of a reference representative point; s 
FIG. 5 is a diagram to help explain differential vec- 
tors for depicting representative points other than 
the reference representative point; 
FIG. 6 is a diagram to help explain an example of 
finding a function that approximates the values of io 
the X components of the differentia! vectors for de- 
picting representative points other than the refer- 
ence representative point; 
FIG. 7 is a flowchart for the process of finding an 
approximate function from the coordinates of rep- is 
resentative points or differential vectors; 
FIG. 8 shows an example of the structure of object 
region data; 

FIG. 9 shows an example of the structure of the rep- 
resentative point trajectory data in the object region 20 
data; 

FIG. 1 0 is a diagram to help explain another exam- 
ple of differential vectors for depicting representa- 
tive points other than the reference representative 
point; 25 
FIGS. 1 1 A and 1 1 B are diagrams to help explain still 
other examples of differential vectors for depicting 
representative points other than the reference rep- 
resentative point; 

FIG. 12 is a diagram to help explain an example of 30 
differential vectors between frames; 
FIG. 13 shows another example of the structure of 
the object region data; 

FIG. 14 is a flowchart for the process of extracting 
the object region at a give time from the object re- 35 
gion data; 

FIG. 15 shows an object region data creating appa- 
ratus according to a second embodiment of the 
present invention; 

FIG. 16 shows an example of the structure of the 40 
representative point trajectory data in the object re- 
gion data according to a second embodiment; 
FIG. 1 7 shows still another example of the structure 
of the object region data; 

FIG. 1 8 shows an example of the data structure of 45 
depth information; 

FIG. 19 is an illustration to help explain the meas- 
urement of positional information in the direction of 
depth; 

FIG. 20 is a flowchart for the process of searching so 
for an object near the specified position; 
FIGS. 21 A and 21 B are illustrations to help explain 
the measurement of positional information in the di- 
rection of depth; 

FIG. 22 is a diagram to help explain the measure- ss 
ment of positional information in the direction of 
depth; 

FIG. 23 is a diagram to help explain the measure- 



ment of positional information in the direction of 
depth; 

FIG. 24 is a flowchart for the preprocess of deter- 
mining the time when the moving body exists at the 
specified distance; 

FIG. 25 is a flowchart for the process of determining 
the time when the moving body exists at the spec- 
ified distance; 

FIGS. 26A, 26B, and 26C are illustrations to help 
explain display flags according to a third embodi- 
ment; 

FIG. 27 is a diagram to help explain the creation of 
representative point trajectory data; 
FIG. 28 shows still another example of the structure 
of the object region data; 

FIG. 29 shows an example of the structure of the 
display flag information; 

FIG. 30 shows still another example of the structure 
of the representative point trajectory data in the ob- 
ject region data; 

FIG. 31 is a flowchart for the process of searching; 
FIGS. 32A, 32B, and 32C are diagrams to help ex- 
plain information about the object passing range ac- 
cording to a fourth embodiment; 
FIG. 33 shows an example of the structure of the 
information about the object passing range; 
FIG. 34 shows' a still another example of the struc- 
ture of the information about the object passing 
range; 

FIG. 35 is a flowchart for the process of selecting 
an object passing the specified coordinate; 
FIG. 36 is a flowchart for the procedure for process- 
ing by an object region data describing method us- 
ing mosaicking techniques according to a fifth em- 
bodiment; 

FIGS. 37A and 37B are diagrams to help explain 
the object region data describing method using mo- 
saicking techniques; 

FIG. 38 shows an example of the structure of the 
information relating to a coordinate conversion; 
FIGS. 39A, 39B, 39C, and 39D are diagrams show- 
ing a procedure for describing an object region in a 
video with object region data according to a fourth 
embodiment; 

FIG. 40 is a diagram showing an example of a proc- 
ess for approximating an object region with an el- 
lipse; 

FIG. 41 is a diagram showing an example of a proc- 
ess for detecting a representative point of an ap- 
proximate ellipse of an object region; 
FIG. 42 is a diagram showing an example of the 
structure of object region data; 
FIG. 43 is a diagram showing an example of the 
structure of data of an approximate figure in object 
region data; 

FIG. 44 is a diagram showing an example of the 
structure of data of a trajectory of a representative 
point in data of an approximate figure; 
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FIG. 45 is a diagram showing anothe example of 
the structure of data of an approximate figure in ob- 
ject region data; 

FIG. 46 is a diagram showing an example of repre- 
sentative points when the approximate figure is a 
parallelogram; 

FIG. 47 is a diagram showing an example in which 
the object region in a video is expressed with a plu- 
rality of ellipses; 

FIG. 48 is a diagram showing an example of the 
structure of object region data including data of a 
plurality of approximate figures; 
FIGS. 49A, 49B, and 49C are diagrams schemati- 
cally showing another process for describing an ob- 
ject region in a video with object region data; 
FIG. 50 is a flowchart showing an example of a pro- 
cedure for obtaining an approximate rectangle; 
FIG. 51 is a diagram showing a state in which an 
inclined and elongated object is approximated with 
a non-inclined rectangle; 

FIG. 52 is a flowchart showing an example of a pro- 
cedure for obtaining an approximate ellipse from an 
approximate rectangle; 

FIG. 53 is a diagram showing the first half of another 
example of the structure of object region data; 
FIG. 54 is a diagram showing the second half of the 
other example of the structure of object region data; 
FIG. 55 is a diagram showing a still another exam- 
ple of the structure of object region data; 
FIG. 56 is a diagram showing a still further example 
of the structure of object region data; 
FIG. 57 shows an object region data creating appa- 
ratus according to a seventh embodiment of the 
present invention; 

FIG. 58 is a flow chart showing one example of 
processing procedure in the seventh embodiment; 
FIG. 59 is an explanatory view for one example of 
a method of calculating an object region optical 
flow; 

FIG. 60 is an explanatory view for another example 
of the method of calculating an object region optical 
flow; 

FIG. 61 is" an explanatory view for an example" of 
expressing a conversion parameter by an approxi- 
mate temporal function; 

FIG. 62 shows one example of an object region data 
description format if the reference object region is 
expressed by a bit map; 

FIG. 63 shows an example of the constitution of the 
object region data creating apparatus in the seventh 
embodiment; 

FIG. 64 is a flow chart showing another example of 
processing procedure in the seventh embodiment; 
FIG. 65 is an explanatory view for a method of mak- 
ing the representative points of an approximate fig- 
ures of object regions correspond to each other; 
FIG. 66 shows the relationship between the types 
of approximate figures and conversion models for 



which conversion parameters can be obtained; 
FIG. 67 shows one example of a description format 
for the object region data if the reference object re- 
gion is approximated by a figure; 
5 FIG. 68 shows one example of the description for- 
mat of object region data including sampling infor- 
mation; 

FIG. 69 is an explanatory view for a state in which 
one object is divided into regions having similar 
10 movement by an optical flow; and 

FIG. 70 shows one example of an object region data 
description format for describing one object in a plu- 
rality of regions. 

15 [0020] A preferred embodiment of an object region 
data describing method and an object region data cre- 
ating apparatus according to the present invention will 
now be described with reference to the accompanying 
drawings. 

20 

First Embodiment 

[0021] FIG. 1 shows the configuration of an object re- 
gion data creating apparatus (or an object region data 
25 converting system) according to the first embodiment of 
the present invention. 

[0022] As shown in FIG. 1 , the object region data cre- 
ating apparatus comprises a video data storage device 
100, a region extracting device 101 , a region figure ap- 

30 proximating device 1 02, a figure representative point ex- 
tracting device 103, a representative point trajectory 
function approximating device 1 04, and an object region 
data storage device 1 06. It may further comprise a re- 
lated information storage device 105. 

35 [0023] FIG. 2 is a flowchart for processing in the object 
region data creating apparatus. 
[0024] The video data storage device 100, which 
stores video data, is composed of, for example, a hard 
disk, an optical disk, or a semiconductor memory. 

40 [0025] The region extracting device 101 extracts a 
partial region of the video data (step S1 ). 
The partial region is generally the object region, such as 
a specific person, plant, animal, car, or building/ in the 
image. Any thing in the video may be used as the object 

45 region, as long as it can be treated as an object in the 
video. The object may be an independent thing, part of 
a thing (e.g., the head of a person, the hood of a car, or 
the entrance of a building), a set of things (e.g., a flock 
of birds or a school of fish). In images, the same object 

so frequently appears on consecutive frames, whereas the 
region corresponding to the same object often varies 
from frame to frame mainly because of the movement 
of the object itself and/or the movement of the camera 
during shooting. 

55 [0026] The region extracting device 1 01 is for extract- 
ing the object region in each frame according to the 
movement or transformation of the target object. As a 
concrete extracting method, any one of the following 
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methods can be used: a method of manually specifying 
the region all over the frames, a^^p^iofte^actln^ 
the cojrtei^of^ 
\ contour "model called Snakes as described in M. Kass, 
et al., "Snakes: Active contour models," International 
Journal of Computer Vision, Vol. 1, No. 4, July, 1988, 
pp. 321 -331 , a method of estimating the transformation 
and movement of the whole of an object from the des- 
tination of the movement of the partial region of the ob- 
ject determined by block matching as described in 
Kaneko, et al., "A fast moving body tracking method for 
creating hypermedia content using robust estimation," 
Technical Report by Information Processing Society, 
CVIM 113-1, 1998, and a method of determining the re- 
gions having similar colors by the growth and division of 
the region as described in "Image Analysis Handbook, 
" Sect. 2, Chapter 2, Tokyo University Publishing House, 
1991. 

[0027] The region figure approximating device 102, 
using a specific figure, approximates the object region 
extracted by the region extracting device 1 01 (step S2). 
[0028] Various types of figure, including a rectangle, 
a circle, an ellipse, and a polygon, can be used. 
The type of figure used in approximation may be deter- 
mined in advance. For example, the type of figure may 
be specified by the user, using specific units, such as 
each of the objects to be approximated. Alternatively, 
the type of figure may be selected automatically accord- 
ing to the shape or the like of each of the objects to be 
approximated. 

[0029] There are various method of approximating the 
region. They include a method of approximating the re- 
gion using a circumscribed rectangle of the object re- 
gion, a method of approximating the region using a cir- 
cumscribed ellipse or inscribed ellipse for the rectangle 
found by the preceding method, a method of approxi- 
mating the region using a circumscribed ellipse for the 
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XI- 



object region, 
matej?olygon^r$jl^ 
tbejnur^ 
thatftejjflemric^ 
tr^lpgr^ 
a j*ve too^ 

wit&a^tieieM Still another 

method is to approximate the region better using in- 
clined figures. There are further methods taking other 




co|ncJd^with the center of gravity of the (a^S^jfQaJg 
irtftothe'value dbtalnedw multiplvinq the area 



result of extracting the regions in several frames is used, 



itWssibJe to smooth the movement or transfojTnatiqnbf 
th^^jjrwiri} jate ^figure or make^ejrqrs in region extract 
tbnj^nspjcuous%rhe size of the approximate figure 
may differ from frame to frame. 
[0031] The figure representative point extracting de- 
vice 1 03 extracts representative points depicting the ap- 
proximate figure outputted from the figure approximat- 
ing device 102 (step S3). What points are set as repre- 
sentative points depends on what approximate figure is 
used. For example, when the approximate figure is a 
rectangle, four or three vertexes can be set as repre- 
sentative points. When the approximate figure is a cir- 
cle, the center and one point on the circumference or 
the both ends of the diameter can be set as represent- 
ative points. When the approximate figure is an ellipse, 
the vertexes of a circumscribed rectangle for the ellipse 
may be set as representative points (in this case, too, 
three of the four vertexes are sufficient) or two foci of 
the ellipse and one point on the ellipse (e.g., one point 
on the minor axis) may be set as representative points. 
When any closed polygon is used as the approximate 
figure, each vertex has only to be set as a representative 
point. 

[0032] Representative ^points are extracted in each 

' frame eachitime, the. figure approximating .device 102 

'outputs infprmatipn.abouttfie approximate^^ one 
trameptE^^ 

hbrizorMi^ 



axis Y^i5 

[6033] l^e^ep^seritatiyepplnt trajectory f unction^ap: % 
proxtrnating device 1 04 approximates^ iime-series^of 
the positions of the representative points _extracted*at 
the figure representative point extracting device 103 (or 
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&vfoetobject:jepfonil 
[0030] The region figure^approximating device 102 
approximates the region frame by frame each time it re- 
ceives the result of the extraction at the region extracting 
device 1 01 . Alternatively, the figure may be approximat- 
ed using the result of extracting the regions in several 
frames before and after the present frame. When the 



55 



the amounts that enable the points to be determined) 
aridffffjang^ 

a function (or approximate function) of timet (e.g., a time 
stamp assigned to an image) or. frame, number f .(step 
S4) . "FHistf Qj^ion Ais; expressed *f or; each representative 
poiht'ano^aije^; in expression, depending on 'whethel* 
X-cbordii^ 

[0034] When the number of representative points (or 
the quantity that enables these points to be determined) 
is n, a total of 2n ajsprpxinnate functions are created be- 
cause'ea^ point requires ao X-coordi- 

nate approximate function and a Y-coordinate approxi- 
mate function. m - S \*xrr > -r^V^;^"^t ' 
[0035] A straighWirie;or ; atsblirie burve.maybe used 
as a function representing a representative point trajec- 
tory.-.;* 

[0036] The above series of processes are carried out 
over the appearing frame to disappearing frame of the 
target object. 

[0037] The determined approximate curve (including 
a straight line) is recorded as object region data accord- 
ing to a specific format in the object region data storage 
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device 106. 

[0038] The related information storage device 105, 
which is provided if necessary, is for storing information 
(related information) about the objects appearing in the 
video data stored in the video data storage device 1 00 
and pointer information (including addresses in which 
related information has been recorded, file names, and 
URLs) used to acquire the related information from an 
external storage device or a server via a network. The 
related information may be characters, sound, still pic- 
tures, moving pictures, or a suitable combination of 
them. Furthermore, the related information may be pro- 
grams or data that describes the operation of the com- 
puter (in this case, when the object is specified by the 
user, the computer carries out a desired operation). The 
related information storage device 1 05 is composed of, 
for example, a hard disk, an optical disk, or a semicon- 
ductor memory, as is the video data storage device 1 00. 
[0039] The object region data storage device 1 06 is a 
storage medium into which object region data including 
the data that represents a curve equation approximating 
a time-sequential trajectory of the positions (the quantity 
that enables the positions to be determined) of the rep- 
resentative points outputted from the representative 
point trajectory function approximating device 104. With 
the configuration including the related information stor- 
age device 105, when the related information about the 
object corresponding to the region expressed by the 
function is stored in the related information storage de- 
vice 105, the related information itself and the address- 
es in which the related information has been recorded, 
can also be recorded in the object region data (when 
information about the addresses in which the related in- 
formation has been recorded is stored in the related in- 
formation storage device 105, the address information 
can also be recorded). The object region data storage 
device 1 06 is composed of, for example, a hard disk, an 
optical disk, or a semiconductor memory, as is the video 
data storage device 100. 

[0040] The video data storage device 1 00, related in- 
formation storage device 105, and object region data 
storage device 1 06 may be composed of separate stor- 
age devices. Alternatively, all of them or part of them 
may be composed of the same storage device. 
[0041] Such an object region data creating apparatus 
may also be realized by executing software on the com- 
puter. 

[0042] In the processing on the object region data cre- 
ating apparatus (particularly, the processing at the re- 
gion extracting device 101 or at the figure approximating 
device 1 02), when the user is allowed to operate the sys- 
tem, a GUI is used to display the video data in, for ex- 
ample, frames and enable the user to input instructions 
(which part is omitted in FIG. 1). 
[0043] Using a more concrete example, the operation 
of the object region data creating apparatus will be ex- 
plained. 

[0044] Explanation will be given showing an example 



of approximating the object region with a polygon (with 
the vertexes of an approximate polygon as representa- 
tive points) and using a second order polynomial spline 
function as an approximate function. In an example of 
s using a polygon as an approximate figure in the follow- 
ing explanation, the vertexes of a polygon generally 
means representative points. 

[0045] FIGS. 3A to 3C are diagrams to help give an 
outline of a series of processes ranging from the process 

10 of extracting the object region with the region extracting 
device 1 01 , the process of approximating the region us- 
ing figures with the figure approximating device 102, the 
process of extracting the representative points of the fig- 
ure with the figure representative point extracting device 

'5 1 03, to the process of approximating the representative 
point trajectory using a function by means of the repre- 
sentative point trajectory function approximating device 
104. 

[0046] In FIG. 3A, numeral 200 indicates one frame 

20 in an image to be processed. 

[0047] Numeral 201 indicates the object region to be 
extracted. The process of extracting the region 201 of 
the object is carried at the region extracting device 101. 
[0048] Numeral 202 indicates an approximate pojy- 

25 gon obtained by approximating the object region using 
a polygon. The process of finding the approximate pol- 
y§6W20£Tromtne object region 201 is carried out at the * 
figure^proxlm ;device:402* ^ 
[0049] FIG. 3B illustrates representative points of the 

so approximate figure over a plurality of frames, or the 
change of the vertexes of the approximate polygon 202 
in the example and an approximate curve of those ver- 
texes. 

[0050] In the first embodiment, a specific representa- 
35 tive point selected from a plurality of representative 
points on the approximate figure is called a reference 
representative point, which is denoted by V 0 (the refer- 
ence representative point is supposed to be the same 
all over the frames). In the embodiment, let any one of 
40 a plurality of vertexes of the approximate polygon 202 
be the reference representative point V 0 . 
[0051 ] There are various selecting methods. They in- 
clude a method of selecting the point having the largest 
\ or smallest X^coofdinate or. Y-coordinate and a method i 
<5 v of selecting the top right point, bottom right point, bottom' 
left point;- or top left point. 

[0052] In the second and later frames," the reference 
representative point V 0 is selected by judging which one 
of a plurality of representative points in the present 

50 frame corresponds to the reference representative point 
V 0 corresponding to the preceding frame. 
[0053] There are various methods of judging which 
representative point corresponds to the reference rep- 
resentative point V 0 in the preceding frame. For exam- 

55 pie, they include a method of setting, as the reference 
representative point V 0 , the point in the present frame 
closest to the reference representative point V 0 in the 
preceding frame, a method of setting, as the reference 



7 



9/13/2006, EAST Version: 2.0.3.0 



13 



EP 1 154 379 A2 



14 



representative point V 0 , the point in the present frame 
closest to the reference representative point V 0 in the 
preceding frame when the center of gravity of the ap- 
proximate figure in the preceding frame is caused to co- 
incide with the center of gravity of the approximate figure 
in the present frame, a method of finding the reference 
representative point V 0 in the present frame by checking 
a plurality of representative points of the approximate 
figure in the preceding figure against a plurality of rep- 
resentative points of the approximate figure in the 
present figure, and a method of finding the reference 
representative point V 0 in the present frame by checking 
the video data in the region of the target object in the 
preceding frame against the video data in the present 
frame. 

[0054] Methods of causing representative points oth- 
er than the reference representative point V 0 to corre- 
spond to those in adjacent frames include methods sim- 
ilar to those described above and a method of causing 
other representative points to correspond to those in the 
adjacent frames, using the reference representative 
point as the starting point. 

[0055] These processes are carried out at the repre- 
sentative point extracting device 103. 
[0056] The representative point trajectory function ap- 
proximating device 1 04 finds an approximate function 
expressing the trajectory 203 from the coordinates of the 
reference representative point V 0 in each frame inputted 
one after another. In FIG. 3B, numeral 203 indicates the 
trajectory obtained by connecting moving locations of 
the reference representative point V 0 in individual 
frames. 

[0057] The coordinates of the reference representa- 
tive point V 0 include the X-coordinate and Y-coordinate. 
Each of the coordinates is approximated independently 
by a function of time t or frame number f . 
[0058] Numeral 204 in FIG. 3C indicates an example 
of the function found for the reference representative 
point V 0 (in this case, only X-coordinate axis for the ref- 
erence representative point V 0 is shown). This example 
shows a case where the approximate section is divided 
into two, t = 0to5andt = 5to 16. 
[0059]- FIG. 4 shows an example of finding a function 
for approximating the value of the X-coordinate of the 
reference representative point V 0 . In FIG. 4, numeral 
301 indicates the time section where the object exists. 
The black point 302 represents the value of the X-coor- 
dinate of the reference representative point V 0 . Numeral 
303 indicates its approximate function. As for the Y-co- 
ordinate, an approximate function is found in the same 
manner. Since polynomial spline functions are used as 
approximate functions, a polynomial is defined for each 
of the time sections divided at points called knots. In this 
case, each of t = 0, t = 5, and t = 1 6 makes a knot time. 
[0060] As for representative points other than the ref- 
erence representative point V 0 of the approximate fig- 
ure, their approximate functions can be found and re- 
corded in the same manner as described above. 



[0061 ] Representative points other than the reference 
representative point V 0 may be represented using the 
relative relationship with other representative points, or 
using differential vectors. They are described by the tra- 

5 jectory of the vectors. 

[0062] Hereinafter, explanation will be given showing 
an example of describing representative points other 
than the reference representative point V 0 using the tra- 
jectory of a vector from an adjacent representative point. 

10 [0063] FIG. 5 is a diagram to help explain one vertex, 
the reference representative point V 0 , and individual dif- 
ferential vectors representing the other vertexes. 
[0064] The individual vertexes other than the refer- 
ence representative point V 0 are denoted by V 1( V 2 , 

15 V M-1 » starting from the reference representative point V 0 
in a predetermined order, for example, clockwise. Here, 
M is the number of vertexes. Since the figure in FIG. 5 
is a pentagon, this gives M = 5. Theuvector#'6m^e1?ex1 

20 are^femined^in .the;:sam^ 

the f^iifes^relative position data)" of the X 'comporfent 
and^cpmp^ 

[0065] A string of black points 502 in FIG. 6 represents 
25 the value of the X-component of vector V 01 at each time. 
[0066] The process of finding these vectors is carried 
out at the representative point extracting device 1 03. 
[0067] ^Jm-flejro^ 

proximatiqg^eyfee^ 04 calculates an approximateJuTic^ 

30 fion 503 that expresses th^ 
ang;%^M^n^M^^c^Ve^or: 
[0068] When the shape of the object hardly changes 
and the movement of the object is close to parallel trans- 
lation, the values of vectors V 01 , V 1t2 , — , V m _ 2i m-i do 

35 not change much. As a result, the difference between 
the approximate function and the actual values be- 
comes smaller, which makes it possible to expect an im- 
provement in the efficiency in describing the object re- 
gion. If the shape of the object does not change and the 

40 movement of the object is completely parallel transla- 
tion, the values of vectors V 01 , V^, V m _ 2 ,m-i doVfTOfc 
cjiangei^ 

furictidtfrriakesa-straight line and approximatioi^errors 
aTpe'rS' 

45 [0069] FIG. 7 is a flowchart for an example of the proc- 
ess of finding an approximate function for the coordi- 
nates of the representative points or the component val- 
ues of the differential vectors from the coordinates of the 
representative points (in this example, the vertexes of 

50 the approximate polygon for the object region) inputted 
one after another to the representative point trajectory 
function approximating device 1 04 or from the compo- 
nent values of the differential vectors. 
[0070] Here, let the time corresponding to the l-th 

55 frame be ^ (i = 0, 1 , »•). Moreover, let v(°) t be the X- 
coordinate of V 0 at time t and let v<J) t 0 = 1,2, — , M-1) 
be the X-component value of Vj^ j at time t. In addition, 
let the largest of the times t corresponding to the knots 
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of the determined spline function be t k . 
[0071] First, at step S601 , the initial setting of t* and i 
is done. 

[0072] At step S602, an approximate function of v<D t 
(in the first embodiment, a quadratic polynomial) is 
found over the section ranging from t k to t h each corre- 
sponding to a knot A method of finding an approximate 
function by least squares is most widely used. In this 
case, however, a condition that the approximate func- 
tion passes knots must be added. The reason is that, 
without this condition, a polynomial spline function be- 
comes- discontinuous .atrknots.~.!n EIG-^X. the r ,approxi- 

mate^nction;v% found -over- m^ ~ 
t a toipOe^ 
^sta^ng^ is 

[0073] SeWIStl^ 

(j = Sf1^-riM-1) of:the approximate function is-calcu- 
.late^3^IlTerapproximation error is calculated usingVtne.. 
followlng^squation: 20 



e 



«> = max lv\ 



^wherevtherange'of iR taken into account is k £ h ...25 



30 



35 



40 



[0074] At step S604, it is determined whether or not 
the approximation error is within a permitted limit. The 
range of the allowed errors may be set to the same value 
for all the vertexes. Alternatively, each vertex may be 
permitted in a different range. If any one exceeds the 
allowed error range, control proceeds to step S605. If 
all the vertexes are within the allowed error range, con- 
trol goes to step S606. 

[0075] At step S605, the approximate function for the 
section ranging from ^ to t i . 1 is determined to be F(i) tk tt . 1 
(t) (j = 0, 1 , M-1 ) and the parameter "k" is set to "M ". 
[0076] At step S606, the value of i is incremented by 
one. Thus, the same approximate function is applied for 
a section in which the error is within an allowable limit 
and a new approximate function is found if the error is 
not within the allowable limit. 

[0077] At step S607, if the coordinate (or the compo- 
nent value of its difference vector) of a new represent- 
ative point is not be inputted in an end judging process, 
the process is completed. If the coordinate (or the com- 
ponent value) of a representative point is inputted, the 
processes at step S602 and forward are carried out 
again. 

[0078] If the end determination is affirmative, at step 
S608, the approximate function for the section ranging 
from t k to tj^ is determined to be F^^t) (j = 0, 1 , •«, 
M-1). 

[0079] Alttibugh* bnly the X-coordinate hais been ex- 4 
plained in FIG: :7i' the "same holds true for the Y-coordi- 55 
nater ln judging errors, errors may be evalu6ted:sirhul- * 
taneouslyfor all the X-coordinates and Y-coordinates of . 
the individual vertexes. ^ • - * 
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[0080] The process at the representative point trajec- 
tory function approximating device 104 may be carried 
out each time the coordinates (component values) of the 
representative points of each frame for the object region 
are obtained (for example, approximation is made each 
time the coordinates (component values) of the repre- 
sentative points for each frame are obtained and simul- 
taneously an approximation error is determined. Knots 
are provided in such a manner that the approximation 
error lies in a specific range, thereby dividing the ap- 
proximation section suitably) or after the coordinates 
(component values) of the representative points of all 
the frames for the object region have been obtained. 
[0081] When the representative point trajectory data 
for the object region is created, the knots may be made 
the same for the coordinates of all the representative 
points. For example, when the coordinates (or compo- 
nent values) of the representative points are approxi- 
mated, if a knot whose error exceeds an allowable value 
is provided in approximating a representative point, the 
same knot is forcibly provided for all the other represent- 
ative points in the approximating process. 
[0082] The approximate function thus obtained, such 
as a spline function, is recorded in the object region data 
storage device 1 06 according to a predetermined data 
•format. 

[0083] Hereinafter, the format of the object region da- 
ta stored in the object region data storage device 1 06 
will be explained. Explanation will be given using a case 
where representative points are approximated by a 
spline function. Representative points may be approxi- 
mated by another suitable function. 
[0084] FIG. 8 shows an example of the format of the 
object region data. 

[0085] Figure type ID 700 determines the type of the 
figure used in approximating the object region. For in- 
stance, the center of gravity (centroid), rectangle, el- 
lipse, or polygon can be specified. 
[0086] Number of representative points 703 indicates 
the number of representative points determined by the 

figure type. « 4W is. 

[0087] Representative^ 

tiie^traeetc^ point. There are as 

many trajectory as equal the number of representative 
points M. When representative points other than the ref- 
erence representative point V 0 are described by the tra- 
jectory from an adjacent representative point, the trajec- 
tory of the reference representative point V 0 is described 
in the first representative point trajectory (1 ) 704; the tra- 
jectory of V 0 §1 is described in the second representative 
point trajectory (2) 704; the trajectory of V 1 ^ is described 
in the third representative point trajectory (3) 704; and 
the trajectory of V M . 2i M-i »s described in the M-th repre- 
sentative point trajectory (M) 704. 
[0088] When approximate functions are found for rep- 
resentative points other than the reference representa- 
tive point V 0 in the same manner as the reference rep- 
resentative point, the trajectory of V 0 is described in the 
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first representative point trajectory (1 ) 704; the trajectory 
of V 1 is described in the second representative point tra- 
jectory (2) 704; the trajectory of V 2 is described in the 
third representative point trajectory (3) 704; and the tra- 
jectory of V M . 1 is described in the M-th representative 5 
point trajectory (M) 704. 

[0089] Object appearing time 701 is the time when the 
desired object appeared. Object existing time period 
702 is the length of time during which the object existed. 
Object disappearing time may be substituted for object 10 
existing time period 702. Both object appearing time and 
object existing time period may be described by frame 
number and the number of frames instead of time. Since 
information about object appearing time 701 and object 
existing time period 702 can also be obtained from the 15 
knot time in representative point trajectory 704, they 
need not necessarily be described. 
[0090] The object appearing time/object appearing 
frame, object existing time period/object exiting frame, 
and object disappearing time/object disappearing frame 20 
may be determined by the frames in which the object 
actually appeared or disappeared in the image. Alterna- 
tively, for example, any frame number after the appear- 
ance of the object in the image may be set as the start 
frame number and any frame number after the start 2s 
frame number and before the one in which the object 
disappeared in the image may be set as the end frame 

number -~#*er*^ mr ''** r ^* wr^-wm^ 
[0091 ] The o^e^ r regib n data ltem 

[0092] A single object may be approximated by a plu- 
rality of approximate figures. In this case, the object re- 
gion data includes, for example, as many figure type 
IDs, representative points, and representative point tra- 35 
jectories as equal the number of figures used in approx- 
imation. 

[0093] FIG. 9 is a concrete example of the data format 
of the representative point trajectory. 
[0094] Number of knots 800 indicates the number of *o 
knots of a spline function that expresses a representa- 
tive point trajectory. The frame corresponding to each 
knot is expressed in time and stored in knot time 801 . 
Since there are as many knot times as equal the number 
of knots, they are described in an arrangement form 45 
802. Similarly, the value of the X-coordinate of each knot 
(or the quantity that enables the coordinate, such as the 
x-component value of its difference vector, to be deter- 
mined) and the value of the Y-coordinate of each knot 
(or the quantity that enables the coordinate, such as the 50 
y-component value of its difference vector, to be deter- 
mined) are described in the form of an arrangement 804 
of X-coordinate of knots 803 and an arrangement 806 
of Y-coordinate of knots 805, respectively. 
[0095] Linear function flag 807 indicates whether only 55 
linear functions are used as spline functions between 
knots. When a quadratic polynomial is partially used, 
this flag is set off. Use of the flag 807 makes it unnec- 



essary to describe any piece of function specifying in- 
formation 80S, 81 2, which will be explained below, when 
only linear function is used as an approximate function. 
This helps decrease the amount of data. The flag is not 
necessarily used. 

[0096] Function ID 809, 813 and function parameter 
81 0, 814 included in the function specifying information 
808, 812 indicate the degree of each polynomial spline 
function and information for determining its coefficient, 
respectively. For example, when a linear polynomial is 
used, 1 is set; and when a quadratic polynomial is used, 
2 is set (of course, the highest degree of a polynomial 
may be set to degree 3 or higher). Since information 
about only knots is sufficient in using a linear polynomial, 
function parameters are not described. When a quad- 
ratic polynomial is used, a single value for determining 
a coefficient (for example, a quadratic coefficient or the 
coordinate of one point other than the knots on the quad- 
ratic curve (the component value when differential vec- 
tors are used)) is described in a function parameter. 
There are as many pieces of function specifying infor- 
mation as equal the number of knots minus one. They 
are described in arrangement form 811 , 815. 
[0097] In the methods explained above, to describe 
representative points other than the reference repre- 
sentative point V 0 , the differential vectors from adjacent 
representative points are found and converted into ap- 
proximate functions. In addition to this method, there is 
a method of using differential vectors from the reference 
representative point V 0 . 

[0098] For example, as explained in FIG. 10, vector 
Vq | from V 0 to V, is calculated for a representative point 
V| (in this case, each vertex of the approximate polygon) 
other than the reference representative point V 0 . Then, 
in the process of FIG. 7, v® t (j = 1 , 2, — , M - 1) is replaced 
with the component value of V 0 j at time t. 
[0099] This method has the advantage that, since any 
representative point other than the reference represent- 
ative point V 0 can be described by the reference repre- 
sentative point V 0 and a single vector, errors in the val- 
ues obtained from the descriptive data are not accumu- 
lated. 

[01 00] There is another method of finding half of the 
vectors' clockwise, starting from the reference repre- 
sentative point V 0 , and the remaining half of the vectors 
counterclockwise as shown in FIG. 11 A. Still another 
method is to provide a plurality of representative points 
expressed by vectors from the reference representative 
point V 0 and then find vectors between adjacent vectors, 
as shown in FIG. 11 B. 

[0101] When the number of representative points of 
an approximate figure is a (a ^ 3), each of the repre- 
sentative points equal to 2 or more and (a - 1 ) or less 
may be set as the reference representative points and 
the remaining one or more representative points be ex- 
pressed by differential vectors from the representative 
points. 

[0102] In these cases, there are as many represent- 
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ative point trajectory 704 of the object region data in FIG. 
8 as equal the number of representative points M. 
[0103] The method of expressing representative 
points other than the reference representative point in 
various ways using the reference representative point 5 
in individual frames as the basic point has been ex- 
plained. Hereinafter, a method of describing the object 
region by expressing the movement of a representative 
point by vectors in consecutive frames and converting 
the trajectory of these vectors into an approximate func- 10 
tion will be explained. 

[0104] In FIG. 12, numeral 1100 indicates an object 
approximate figure (polygon) in the initial frame. Numer- 
al 1102 Indicates an object approximate-figure in the 
^rame-at-timelt: Numeral 1101 indicates an object ap- * 15 
proximate figure just before 1102. Numeral 1103 -indi- 
cates one of the representative points of the object re^ . 
g] oh at time t* Numeral 1104 indicates the representative 
pointporr^ -to* the representative point 1 1 03. Jn 

the preceding frame! Numeral 1 1 05 indicates a motion 20 
vector from the representative point 1104 to the repre- 
sentative point 1103, representing the movement of a 
representative point in the frame at time t. Since the mo- 
tion vector is obtained at each time corresponding to 
each frame, it is possible to perform approximation us- 25 
ing a function of time t as described above. 
[0105] A method of describing the object region is to 
executethe. flowchart of FIGV7 using v^ : (j =:0-1;V:-", M 
- 1) as the component values. of V'| at'timeVt^He^^let^ 
therhoiid^ 30 
the* ^mbtion^^or of Vo'b ^J^i^e^-in tne same manner 
as tfiosi "oT-WIJttef representative points and convert- 
ed into an approximate function. 
[0106] In a method using such a motion vector, the 
coordinates of all the representative points of the ap- 35 
proximate figure in the frame where an object appeared 
have to be described. Accordingly, the data format (cor- 
responding to the example of FIG. 8) described in this 
method is as shown in FIG. 13. The data format of FIG. 
1 3 differs from that of FIG. 8 in that representative point *o 
initial position 1 200 is added. In the representative point 
initial position 1 200; the coordinates of M representative 
points in the initial frame are described. In this case," the 
coordinates of all the representative points have only to 
be described directly. Another method is to describe on- 
ly the coordinate of one representative point directly and 
further describe the coordinates of the other represent- 
ative points using differential vectors from adjacent rep- 
resentative points as shown in, for example, FIG. 5. Still 
another method is to describe representative points us- so 
ing differential vectors from one representative point V 0 
as explained in FIG. 10. 

[0107] Still another method of describing the object 
region data is to find directly the motion vector from the 
position of the initial representative point to the position 55 
of a representative point at time t and convert the motion 
vector into an approximate function. 
[0108] Next, a method of extracting the object region 



at given time T from information about the object region 
described in the object region data will be explained. 
This process is executed at an information processing 
system that handles video data and its object region da- 
ta. Such an information processing system can, of 
course, be realized by executing software on a compu- 
ter. 

[0109] FIG. 14 is a flowchart for an example of the 
process in that case. 

[0110] The following is an explanation of describing 
representative points other than the reference repre- 
sentative point V 0 using the trajectory of a vector from 
an adjacent representative point. 
[0111] At step S901 , it is determined whether an ob- 
ject exists at a given time T The determination can be 
made easily by referring to the object appearing time 
701 and object existing time period 702. If no object ex- 
ists at time T, this means that there is no object region. 
Thus, the process is ended immediately. 
[01 1 2] At step S902, the approximate function F^u, 
(t) (j = 1 , 2, M-1) at time T is restructured. Here, let 
the times at both end knots of time T be t a and t^ The 
approximate function can be reconstructed using the co- 
ordinates (or the component values of its difference vec- 
tor) at t a and tb described at X-coordinate of knot 803 or 
Y-coordinate of knot 805, function ID 809, 813, and func- 
tion parameter 810, 814, as shown in FIG. 9. That is, 
when a linear polynomial is used as the approximate 
function, it can be obtained as a straight line passing 
two knots. When a quadratic polynomial is used and a 
quadratic coefficient is described in the function param- 
eter, the quadratic coefficient is determined from the val- 
ue of the function parameter and the coefficient of lower 
than second order is determined in such a manner that 
the line passes knots. 

[0113] At step S903, t = T is substituted into the ap- 
proximate function, thereby finding the coordinate of V 0 
at time T and the component values of V 1 2 , V 2 3, — , 

[0114] Finally, at step S904, V 0 and V 1 2 , V 23 , 

v m-2,m-i are a d de d one after another, thereby calculat- 
ing the coordinates of V 0 , V 1( — , V M . V 
[0115] On the basis of the representative' points ob- 
tained in this way, the information processing system 
can carry out various processes. They include the proc- 
ess of creating a figure that approximates the object re- 
gion, the process of showing the user the target object 
by depicting the region of the approximate figure in the 
object's video data in a specific representation form, and 
the process of, when the user specifies an image on the 
screen with a pointing device, such as a mouse, judging 
that the target object has been specified, if the approx- 
imate figure of the object region at that time (field) exists 
and the specified position is within the approximate fig- 
ure. 

[0116] For example, when related information is at- 
tached to the object region data of FIG. 8, or when a 
database including related information about individual 



11 



9/13/2006, EAST Version: 2.0.3.0 



t 
* 



21 



EP1 154 379 A2 



22 



objects exists independently from the object region da- 
ta, the related information is used for hypermedia or 
search of objects. 

[01 17] In hypermedia, when the user specifies the ob- 
ject with a mouse, it Is determined whether the specified 
time and place are inside or outside the object region 
and, if it is determined that they are inside the object 
region, related information about the object is retrieved 
or displayed easily. When the related information is the 
data that describes a program or the operation of the 
computer or its pointer, tfie^g^^ariispecifyj^e^ecl^ 

jojsafcgftfte^^^ 

[01 18] In the first embodiment, any video and object 
may be used. For instance, when videos are such con- 
tent as movies, objects are such characters as actors, 
or properties in a movie, and related information is ex- 
planation about the actors, the viewer seeing a movie 
can read a description of the desired actor by just click- 
ing on the actor's image. Similarly, the related informa- 
tion can be applied to any type of electrical content, such 
as electronic encyclopedias or electronic catalogs. 
[0119] For instance, in searching for an object, the 
passing position of the object, the non-passing position 
of the object, the size of the object at a certain position, 
and the stay time at a certain position can be used as 
search keys to search for an object that satisfies the con- 
dition. For any search key, whether the condition is sat- 
isfied can be judged by extracting the coordinates of rep- 
resentative points one after another in the time period 
during which the object exists, judging whether a given 
point is inside or outside the figure composed of repre- 
sentative points, and calculating the area. 
[0120] Furthermore, describing a keyword in the re- 
lated information enables the object to be searched for 
by the keyword. Moreover, describing the amount of fea- 
ture, such as shape, texture, activity, or color, extracted 
from the object in the related information enables the 
object to be searched for on the basis of the amount of 
feature. 

[0121] In addition, for example, on the basis of the 
quantity of feature, such as the shape, texture, activity, 
or color of the object obtained by analyzing the object 
region data, a surveillance system for watching for a du- 
bious character can be realized. 
[0122] Hereinafter, a method of providing video data 
and object region data will be explained. 
[01 23] To provide the user with the object region data 
created by the processes of the first embodiment, the 
provider needs to offer the object region data to the user 
by any suitable method. Various modes of the providing 
method can be considered as described below: 

(1) The mode of recording the video data, its object 
region data, and its related information onto a single 
recording medium (or plural recording mediums) 
and offering these data items to the user at the same 
time. 

(2) The mode of recording the video data and its 
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object region data onto a single recording medium 
(or plural recording mediums) and offering these da- 
ta items to the user at the same time, but offering 
the related information separately to the user or not 
offering the related information to the user (the latter 
case is that, for example, the user can acquire the 
related information via the Internet or the like, even 
if it is not offered to the user). 

(3) The mode of offering the video data to the user 
independently, recording the object region data and 
related information onto a single recording medium 
(or plural recording mediums), and offering these 
data items to the user at the same time. 

(4) The mode of offering the video data, object re- 
gion data, and related data separately to the user. 



[0124] In these modes, the data items are offered 
mainly with a recording medium. Alternatively, part or all 
of the data items may be offered with a communication 
20 medium. 

[0125] As described above, in the first embodiment, 
the object region in the video can be described by the 
parameters of the curve that approximates the timese- 
quential trajectory of the representative points of the ap- 

25 proximate figure (the trajectory of the coordinates (or the 
quantity that enables the values to be determined) of the 
representative points using the frame numbers or time 
stamps as variables). Therefore, the amount of data 
used to determine the object region is decreased effec- 

30 tively and handling is made easier. When the object is 
a rigid body, the relative position varies less than the 
absolute position and a function that approximates its 
trajectory can be described using a smaller amount of 
information. Moreover, the amount of communication in 

35 transmitting the data can be reduced. It is easy to create 
an approximate figure from the parameters of the ap- 
proximate curve. When a basic figure (e.g. , a closed pol- 
ygon) is used as a representative of the approximate 
figure, this makes it possible to determine whether or 

40 not any coordinate specified by the user is inside the 
object region (approximate figure) (whether or not the 
object region has been specified), using a simple deter- 
mination equation. Therefore, it becomes easy to spec- 
ify the moving object in the video so that it is easily 

45 search the object based on the passing position of the 
object, the non-passing position of the object, and the 
stay time at a certain position. There is provided a hy- 
permedia application with easy handling. 
[0 1 26] Other embodiments of the object data creating 

so apparatus according to the present invention will be de- 
scribed. The same portions as those of the first embod- 
iment will be indicated in the same reference numerals 
and their detailed description will be omitted. 

55 Second Embodiment 

* 

[01 27] A second embodiment of the present invention 
is such that information on the direction of depth, in ad- 
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•dition to the two-dimensional information on the screen, 
is included in the object region data about an object in 
the image in the first embodiment. Explanation will cent- 
er on the difference between the second embodiment 
and the first embodiment. 

[0128] In the second embodiment, the object region 
data creating apparatus of the first embodiment has to 
be further provided with a processing device 1 08 for ob- 
taining information about the direction of depth (herein- 
after, referred to as depth Information). The depth infor- 
mation processing device 1 08 is connected between the 
video data storage device 100 and the representative 
point trajectory function approximating device 104, as 
shown in FIG. 15. 

[0129] There are two methods of giving depth infor- 
mation: one method of giving depth information in con- 
secutive values (Z-coordinates) and the other method 
of giving depth information in discrete level values (more 
preferably integral values in a specific range). When the 
video data comes from a video camera, each value is 
based on the data obtained by measuring the object or 
is specified by the user. When the video data is artificial 
(as in CG or animation), the video data is based on the 
value about depth, if this value is given, or is specified 
by the user. 

[01 30] In each of the above cases, the depth informa- 
tion is given to each target object or to each represent- 
ative point of an approximate figure of the target object. 
[0131] In each combination of the above methods, the 
depth information is given to all of the frames ranging 
from the object appearing frame to object disappearing 
frame or to all of the specific sections (e.g., the adjacent 
knot sections) of the frames ranging from the object ap- 
pearing frame to object disappearing frame. 
[01 32] When the method of using consecutive values 
as the depth information, the method of giving the depth 
information to each representative point, and the meth- 
od of giving the depth information to ail the frames rang- 
ing from the object appearing frame to object disappear- 
ing frame are used, the Z-coordinate of each represent- 
ative point is subjected to the same process as are the 
X-coordinate and Y-coordinate of each representative 
point of the approximate figure of the target object in the 
first embodiment (this process is carried out at the rep- 
resentative point trajectory function approximating de- 
vice 104). 

[0133] In this case, an example of the data format of 
a representative pointtrajectory of the object region data 
(e.g., the object region data of FIG. 8 and its variations) 
is shown in FIG. 16. FIG. 16 differs from FIG. 9 in that 
an arrangement of Z-coordinates of knot 832 and an ar- 
rangement of function (Z) specifying information 836 are 
added to the X-coordinate and Y-coordinate. 
[01 34] When the method of using consecutive values 
as the depth information, the method of giving the depth 
information to each target object, and the method of giv- 
ing the depth information to all the frames ranging from 
the object appearing frame to object disappearing frame 
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are used, the Z-coordinate of the target object is sub- 
jected to the same process as are the X-coordinate and 
Y-coordinate of each representative point of the approx- 
imate figure of the target object in the first embodiment 

5 (this process is carried out at the representative point 
trajectory function approximating device 104). 
[0135] In this case, for example, as shown in FIG. 17, 
the depth information 705, or the trajectory of the value 
of the Z-coordinate of the target value, is added to the 

10 object region data (e.g., the object region data of FIG. 
8 and its variations). An example of the data format of 
the depth information is shown in, for example, FIG. 18. 
FIG. 18 differs from FIG. 9 in that only the value of the 
Z-coordinate is described. 

15 [0136] When the level value (discrete value) is used 
in the above two methods, it is expected that the same 
level value will last over a plurality of frames. Therefore, 
for example, each time the level value changes, the level 
value after the change and the number of the frame 

20 whose level value has changed may be described. 
[0137] Furthermore, when the depth information is 
given to the adjacent knot sections, it is expected that 
the number of adjacent knot sections is not much larger 
than the number of all the frames ranging from the object 

25 appearing frame to disappearing frame. Therefore, the 
correspondence between all the values and the adja- 
cent knot sections may be described. 
[0138] The following is an explanation of how the 
processing device for obtaining the depth information 

30 measures the values. 

[0139] The depth information includes such absolute 
information as the distance from the camera or a coor- 
dinate in an coordinate system set in a three-dimension- 
al space and such relative positional information as the 

35 moving distance from the initial object position or the 
numerical value representing the magnitude of the mov- 
ing distance. 

[0140] Since it is generally difficult to find absolute po- 
sitional information from the image taken by a single 
40 camera, the positional information is acquired by mak- 
ing measurements using a special range sensor as de- 
scribed in Iguchi and Sato, Three-dimensional image 
measurement," Shokodo, pp. 20-52, or using a plurality 
of cameras and a stereo method. When a certain imag- 
es ing condition can be assumed, however, the positional 
information can be obtained even from the image taken 
by a single camera. An example of this case will be given 
below. 

[0141] For example, in watching a road, a car 1 301 is 
50 imaged by a camera 1300 as shown in FIG. 19. Since 
the camera is generally fixed, the camera 1300 can be 
calibrated in advance. A plane equation can be calcu- 
lated in a three-dimensional space in advance, provided 
that the road surface on which the car runs is a flat sur- 
55 face. Under these preconditions, the position of a point 
1306 where the tire section of the car touches the 
ground 1303 is determined. On an image pickup plane 
1302, the point 1306 is assumed to have been sensed 
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at the position of a point 1305. On this assumption, the 
intersection of the viewing line 1 304 of the camera pass- 
ing the point 1305 and the plane 1303 is determined, 
thereby finding the position of the point 1306. 
[0142] The viewing line 1304 of the camera can be 
calculated from the camera parameter obtained from 
calibration. Although the road surface is known, the 
height of the car's bumper may be assumed to be 
known. 

[0143] For example, in thejfinfoj^M^ 
systerMhathandlesth^ 

datallrtgb^^ can be 

seaTrched-f or ; using these three-dimensional data'rtemsV- 
[01 44] RG7 20* is a f iowchart f or "such" a' searching 
process, ^^^^^^^^^^^^^^^^^ , 

about tne^speciftedioDject to be searctieaifoE'i&inputtedl 
[0146] ; At step S2701, the distance "between its posi- 
tion and the object's three-dimensional position related 
to the whole object region data is calculated. 
[0147] After the three-dimensional distance has been 
calculated for all the objects, the objects wh ose distance 
is smaller than a threshold value are found and output- 
ted. Instead of determining the threshold value, the ob- 
ject whose distance is the smallest may be outputted as 
the result of the searching. 

[0148] It is difficult to determine the absolute position- 
al information about the object in the video from only the 
general video. In the case of the image of a car coming 
closer from FIG. 21 A to FIG. 21 B taken by a stationary 
camera, observing changes in the size of the car on the 
image screen makes it possible to determine such rel- 
ative depth information as tells whether the car was 
coming closer to or going farther away from the camera. 
An example of this case will be given below. 
[0149] An ordinary camera optical system can be il- 
lustrated using a perspective transformation model . 
based on a pinhole camera as shown in FIG. 22. Nu- 
meral 1600 is the lens principal point of a camera and 

1601 an imaging plane. It is assumed that an object 

1602 is moving closer to the camera. FIG. 23 is a view 
of the situation taken from above. As shown in FIG. 23, 
it is assumed that the object moves closer to the camera, 
while keeping parallel with the Z-axis. The width 1 704 
of the image of the front side of the object 1 704 before 
movement increases to the width 1705 of the image of 
the front side of the object 1 705 after movement. The 
smaller the distance between the object and the camera 
lens principal point 1700, the larger the image. Thus, 
changes in the relative position can be expressed using 
the size of the image. For example, let the width of the 
image at the initial position of the object be 1 . On this 
assumption, the ratio of the initial width to that of a sub- 
sequent image is calculated. Since the width of the im- 
age can be considered to be proportional to the recip- 
rocal of the distance from the lens principal point 1700, 
the reciprocal of the value of the ratio is calculated and 
held as the depth information. In this case, the closer 
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the car gets to the camera, the smaller the value. The 
farther the car goes away from the camera, the larger 
the value. Instead of width, the area of the image or the 
area of a characteristic texture of the object surface may 
be used. 

[01 50] For example, in the image processing system 
that handles video data and its object region data, use 
of information that tells changes in these relative posi- 
tions makes it possible to find the time when one moving 
object will be at a specified distance. 
[0151] FIGS. 23 and 24 are flowcharts for examples 
of the process in this case. 

[01 52] FIG. 24 is a flowchart for the p reprocess of ac- 
tually making a search. In FIG. 24, the depth value one 
moving object holds is normalized. At step S2800, let 
the smallest value of the depth value be 1 . At step 
S2801, the depth value is normalized by dividing the 
smallest value. At step S2802, it is determined that all 
the processes have been completed. 
[0153] Next, at step S2900 in FIG. 25, the depth value 
is inputted. At step S2901 , the difference between the 
input value and the depth value is calculated. After the 
input value has been compared with all the depth values 
(step S2902), the time at which the difference is the 
smallest is outputted (step S2903). 
[01 54] With the second embodiment, adding informa- 
tion on depth as well as the two-dimensional positional 
information, plane information, makes it possible to 
search for an object, taking into account the direction of 
depth, for example, the distance information from the 
camera. 

Third Embodiment 

[01 55] A third embodiment of the present invention is 
such that display flag information is further included in 
the object region data in the video in the first or second 
embodiment. The display flag information is related to 
a display flag that indicates whether an object (or part 
of the object) is visible or invisible because it hides be- 
hind another object. Explanation will center on the dif- 
ference between the third embodiment and the first or 
second embodiment. 

[0156] In the third embodiment, a process related to 
the display flag is carried out at, for example, the repre- 
sentative point trajectory function approximating device 
104. 

[0157] For instance, as shown in FIG. 26A to FIG. 
26C, when there are a plurality of objects in the video, 
an object 2101 may often disappear behind another ob- 
ject 2102 and appear from behind the object 2102. To 
describe this state, display flag information is added to 
the object region data. 

[0158] There are two methods of giving the display 
flag: one method of giving the display flag to each target 
object and the other method of giving the display flag to 
each representative point of an approximate figure for 
the target object. 
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[01 59] When the display flag is given to each object, 
if the display flag is set, this means that the object does 
not hide behind another object. In this case, the object 
is displayed in reproduction. If the display flag is not set, 
this means that the object hides behind another object. 5 
In this case, the object is not displayed in reproduction. 
[0160] When the display flag is given to each repre- 
sentative point of an approximate figure for the target 
object, if the display flags for all the representative 
points of an approximate figure for one target object are 10 
in the same state, the object is displayed or not dis- 
played as described above. If the display flags for some 
representative points are set and those for the remain- 
ing ones are not set, the object is displayed, taking the 
situation into account (for example, only the correspond- 15 
ing part of the object is displayed). 
[01 61 ] A display flag is given to each interval between 
key points. It is determined at the same time that repre- 
sentative point trajectory data about the object region is 
created. Key points may be provided independently of 20 
the knots of an approximate function or in such a manner 
that they never fail to fall on the knots. For instance,. . 
when a key point occurs, that point of time may be for- 
cibly made a knot. 

[01 62] When a display flag is given to each target ob- 25 
ject, a key point is set when the object changes from the 
visible state to the invisible state or vice versa. In an 
example in FIG. 27, an object 2201 is visible until frame 
i and disappears from frame i to frame j. From frame j 
and forward, when the object appears again, a key point so 
is placed at frame i and frame j. Then, the disappearing 
state is set to the display flags for frame i to frame j and 
the disappearing state is set to the display flags for the 
remaining frames. The same holds true when a display 
flag is given to each representative point of an approx- 35 
imate figure for the target object. 
[01 63] The representative point trajectory data is cre- 
ated on the assumption that the object is visible over all 
the frames. When information about the representative 
points is unknown because the object hides behind an- 40 
other object, the representative point trajectory data is 
created by supplementing the data with information 
about the representative points before and after the un- 
known representative points. After the representative 
point trajectory data has been created, a flag is set, de- ^ 
pending on whether the object is visible or invisible. 
Therefore, even when an object appears and disap- 
pears, it can be expressed by a series of representative 
point trajectory data items. 

[01 64] Hereinafter, variations of the display flag infor- so 
mation will be described. 

[0165] Although a display flag is normally set to each 
interval between key points, a start time stamp and an 
end time stamp may be added to a display flag itself. 
This has the merit of being able to set a visible range ss 
and an invisible range independently of key points. 
[01 66] A display flag may be given to each object. Al- 
ternatively, it may be given independently to each rep- 



resentative point trajectory data item. For instance, 
when an object is represented by a polygon and its in- 
dividual vertexes are expressed as representative 
points using trajectory data, giving a display flag to each 
representative point trajectory data item enables only an 
invisible part of the object to be represented. 
[01 67] In addition to showing whether the object is vis- 
ible or invisible, the display flag may take the value of 
an integer representing priority. When objects overlap 
with each other, this means that an object with lower pri- 
ority hides behind an object with higher priority and only 
the object with higher priority is displayed. It is assumed 
that, when the priority is 0, the object is invisible, regard- 
less of other objects. 

[01 68] Use of integer values as display flags has the 
advantage that an object overlapping problem can be 
solved even when other objects are combined with the 
object in the image. In using integer values as display 
flags, a display flag may be given to each object or to 
each representative point trajectory data item. 
[01 69] FIGS. 27 and 28 show examples of the struc- 
ture of the object region data including display flags. 
[0170] FIG. 28 shows an example of adding display 
flag information 706 to the object region data (for exam- 
ple, that in FIG. 8 or its variations) when a display flag 
is added to the target object (of course, there is an ex- 
ample of further adding related information to the object 
region data). 

[0171] FIG. 29 shows an example of the structure of 
display flag information 705. 

[0172] In this example, each display flag 2304 has a 
start time stamp 2302 and an end time stamp 2303. 
Since the number of display flags P 2301 has as many 
display flags as equal the number of key points minus 
1 when the start time stamp 2302 and end time stamp 
2303 are not used in the total number of display flags, 
the number of display flags P 2301 may be omitted. Dis- 
play flag 2304 takes the value of 0 or 1 to indicate ap- 
pearance or disappearance. It may take an integer value 
to represent priority. 

[01 73] When a display flag is given to each represent- 
ative point of an approximate figure for the object, dis- 
play flag information is added to, for example, each rep- 
resentative point trajectory of the object region data (for 
example, that in FIG. 8 or its variations). 
[0174] FIG. 30 is an example of the structure of the 
representative point trajectory data in that case. An ex- 
ample of the structure of display flag 900 in FIG. 30 is 
as described above. 

[0175] FIG. 31 is a flowchart for an example of the 
searching process at the information processing system 
that handles video data and its object region data. 
[01 76] 6ifs^'step*S25lHh^ 
key. At 'step:S25^ 

matibn forthe object region being searched for and the 
search Key is calculated. i 
[0177] At step S254, it is determined whether or not 
the display flag for the object region corresponding to 
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the search key is visible. If the display flag is invisible, 
matching is considered to be unsuccessful. 
[0178] At step S255, when a display flag is visible and 
the distance is smaller than a threshold value, matching 
is considered to be successful and recording is done. 
[0179] This is carried out for all the objects. When it 
is determined at step S252 that calculations have been 
done for all the object regions, then the result of the cal- 
culations is outputted at step S256, which completes the 
process. 

[0180] As described above, the addition of display 
flags makes it possible to determine whether or not the 
object is occlusion (visible or invisible in reproduction), 
without making calculations from the relationship with 
other objects. This enables the displayed object to be 
searched for efficiently. 

Fourth Embodiment 

[0181] A fourth embodiment of the present invention 
is such that information indicating the range over which 
an object in the video passed on the screen during the 
time from when it appeared on the screen until it disap- 
peared (hereinafter, referred to as object passing range 
data) is also included in the object region data in the 
first, second, orthird embodiment. Explanation will cent- 
er on the difference between the fourth embodiment and 
the first, second, orthird embodiment. 
[01 82] In the fourth embodiment, there is further pro- 
vided a processing device for creating object passing 
range information which is connected between the re- 
gion extracting device 1 01 and the region figure approx- 
imate g device 102. 

[0183] When an object is represented by the repre- 
sentative point trajectory data about the object region, 
one object is normally represented using a plurality of 
trajectory data items. In searching for an object that 
passed the specified point, it would be convenient for 
the object passing range to be represented without cal- 
culating the object region from a plurality of trajectory 
data items. 

[01 84] To achieve this, object passing range informa- 
tion about such a minimum rectangle or polygon as en- 
closes the whole trajectory of the object is created. This 
information is added to the object region data. 
[01 85] When a rectangle is used, it may have or have 
not an inclination. Use of a rectangle with an inclination 
has the advantage that the trajectory of the object region 
can be approximated with smaller errors. Use of a rec- 
tangle with no inclination has the advantage that it is 
easy to calculate parameters for the rectangle. 
[01 86] In FIG. 32A, numeral 2402 shows an example 
of approximating the trajectory region of an object 2401 
using a rectangle with no inclination. 
[01 87] In FIG. 32B, numeral 2403 shows an example 
of approximating the trajectory region of the object 2401 
using a rectangle with an inclination. 
[01 88] In FIG. 32C, numeral 2404 shows an example 



of approximating the trajectory region of an object 2401 
using a polygon. 

[01 89] To calculate such a minimum rectangle or pol- 
ygon as encloses the whole trajectory of the object, the 

5 region is found in each frame, then the logical sum of 
the regions over all the frames is calculated, and there- 
afterthe resulting logical sum region is approximated by 
the smallest rectangle or polygon. 
[01 90] In calculating such a minimum rectangle or pol- 

10 ygon as encloses the whole trajectory of the object, the 
logical sum of the smallest rectangle or polygon that en- 
closes the whole trajectory of the object region related 
to the already calculated frames and the object region 
in a newly added frame may be calculated and the re- 

*5 suiting logical sum region may be approximated by the 
smallest rectangle or polygon. 
[01 91] Furthermore, when such a minimum rectangle 
or polygon as encloses the whole trajectory of the object 
is calculated, such a minimum rectangle or polygon as 

20 encloses the trajectory of each representative point may 
be calculated and then such a minimum rectangle or pol- 
ygon as encloses the logical sum of the regions of the 
rectangles or polygons obtained over all the trajectory 
be calculated. 

25 [0192] FIG. 33 shows object passing range informa- 
tion added to the object region data. Circumscribing fig- 
ure type 3401 indicates the rectangle with no inclination 
as shown in FIG. 32A if it is 0, the rectangle with an in- 
clination as shown in FIG. 32B if it is 1 , and the polygon 
30 as shown in FIG. 32C if it is 2. Number of apexes N 3402 
is 2 if the circumscribing figure type 3401 is 0, 3 if the 
circumscribing figure type 3401 is 1 , and arbitral number 
if the circumscribing figure type 3401 is 2. If the object 
has depth information, a three dimensional circumscrib- 
es ing figure is introduced and the object passing range in- 
formation is added with the depth information as shown 
in FIG. 34. 

[0193] FIG. 35 is a flowchart for an example of the 
process of, when the user specifies an coordinate, se- 

40 lecting such an object as passes the coordinate at the 
information processing system that handles, for exam- 
ple, video data and its object region data. 
- [01 94] At-'step S261f 1he^s^^ tt>— 
besearchedtor^ 

45 ^jJe onpolygohW 

jectory is compares^ 
ly objects ^ 

whlcb encloses -the whble trajectory are extracted (the 
number of extracted objects may be 0, 1, or more). At 

so step S263, it is determined for the extracted objects 
whether or not the coordinates inputted from the repre- 
sentative point trajectory are in the object region (for ex- 
ample, inside the approximate figure). 
[0195] Generally, judging the inside or outside of the 

55 smallest rectangle or polygon that encloses the whole 
trajectory requires a smaller amount of calculations than 
judging the inside or outside of trie object based on the 
representative point trajectory. When the n umber of ob- 
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jects to be searched for is large, first judging the inside 
or outside of the smallest rectangle or polygon that en- 
closes the whole trajectory enables an efficient search. 
[0196] As described above, adding information about 
the smallest rectangle or polygon that encloses the s 
whole trajectory of the object enables the passing range 
of the object to be represented efficiently. This makes it 
easier to determine whether an object passes a certain 
point. 

[0197] To increase search efficiency, not only ex- 10 
pressing the object region in a function but also giving 
a figure enclosing the position in which an object exists 
temporally and spatially makes it possible to eliminate 
objects located in completely different places from the 
things to be searched for. is 

Fifth embodiment 

[0198] The fifth embodiment of the present invention 
is such that the invention is applied to mosaicking. 20 
[0199] Mosaicking is a method of combining pictures 
taken in such a manner that they are partially over- 
lapped with each other to form a single wide-range pic- 
ture. Such a combined picture is called a panorama pic- 
ture. A plurality of methods of forming a panorama pic- 25 
ture from a plurality of pictures have been proposed (as 
described in, for example, M. Irani and P. Anandan, "Vid- 
eo Indexing Based on Mosaic Representations, n Pro- 
ceedings of the IEEE, Vol. 86, No. 5, May 1998, pp. 
905-921.). 30 
[0200] The configuration of the fifth embodiment is ba- 
sically the same as that of each of the first to fourth em- 
bodiments. The fifth embodiment differs from the first to 
fourth embodiments in that the representative points of 
an approximate figure is represented by a coordinate 35 
system of the whole panorama picture not by coordinate 
systems of the respective pictures. 
[0201] Hereinafter, explanation will center on the dif- 
ference between the fifth embodiment and the first to 
fourth embodiments. 40 
[0202] FIG. 36 is a flowchart for an example of 
processing by an object region data describing method 
using mosaicking techniques: FIGS. 37A and 37B are 
diagrams to help explain the method. 
[0203] A?panpra 

Theipoordinates of - each .pixei -pttherJn^lvidual^li pic- , 
tures before combination are converted using a certain.. 1 
•reference point (fo? example, trie left bottom point in a ' 1 
:? f ramef in a panorama ^ 
the individual representative points of an approximate so 
figure for the object region in each still picture become** 
a series btX^brdi hates or a series of Y-coordinates in - 

• .a.coordinate system for the panorama.image Jn..^ 

• ^embodiment, a series of X-coordinates or Y-cobrcii inates 4 

of the individual representative points of an approximate^ ss 
figure forthe^ object region in each still picture is approx-- 
imated using a function as in the f irst to f durtjr emSpdi- 
merfg^rexampler a differs 



a single.still picture or between still pictures. A series of 
theicoo^^ usingr^a 
function? 

[0204] At step S1900, a panorama picture is formed 
from a plurality of still pictures inputted. These input im- 
ages are shown as 2000 to 2005 in FIG. 35A. They were 
obtained by photographing a moving body, while moving 
a camera. Numeral 2006 is an object. Numerals 2000 
to 2005 indicate frames in which the same object was 
photographed. These pictures are often consecutive 
frames in a moving picture or still pictures photographed 
in such a manner that the camera was so moved that 
the photographic ranges may overlap with each other. 
[0205] in FIG. 35B, numeral 2007 indicates a pano- 
rama picture obtained by combining these input pic- 
tures. 

[0206] At step S1 901 , the individual object regions ex- 
isting in the resulting panorama picture are approximat- 
ed using figures. The panorama picture formation at 
step S1900 and the figure approximation of the object 
region at step S1 901 may be reversed in order. Depend- 
ing on conversion in forming a panorama picture, the 
type of approximate figure for the object region may 
have to be changed. For example, in a case where the 
object region is approximated using a rectangle, when 
a panorama picture is formed by affine transformation, 
the resulting object region is not necessarily a rectangle. 
In this case, a panorama picture is formed earlier. Alter- 
natively, the formed panorama picture is converted and 
the converted picturejs modified. 
[0207] AtfifepS 1902 ,or 
characteristic points of an 'approximate figure fori he ob- 
ject region obtained at step S1 901 are approximated us^ ' 
ing r a"fuhcfion:The trajectory of the object region, is ob- . 
tained by determining a reference object region arid 
finding the amount of change in each object region on 
the basis of the reference object region. For example, 
in FIG; 35B, the object region 2008 of a first input image 
is used as a reference and changes in the object region 
following the reference one are made a trajectory 2009. 
In this example; the center of gravity of the object region 
is used as a representative point. The same holds true 
when a representative point of ahdtherapproximate fig-' 
ure, such as a rectangle or an ellipse, is used, or when 
anotfierliriaracteristic point is used as a representative 
point. 1 * 

[0208] There are two methods of determining the 
amount of change from the reference point: one method 
of using the difference from the reference point and the 
other method of using the difference from the preceding 
object region. The amount of change can be approxi- 
mated using a function. A change from the reference 
point can be approximated using a motion model, such 
as a parallel/rotational movement or affine transforma- 
tion, not using the movement of representative points or 
characteristic points. Then, the movement of the object 
is described as the trajectory of its conversion coeffi- 
cient. In this case, too, the trajectory of the conversion 
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coefficient is approximated using a function. 
[0209] At step S1903, the parameter of the function 
that approximates the trajectory found at step S1902 is 
described according to the format of the aforementioned 
data structure. 

[0210] The parameters used in forming a panorama 
picture from the individual input pictures can be de- 
scribed in the same manner, considering all the input 
pictures as object regions. 

[0211] FIG. 38 shows panorama parameters added 
to the object region data. The parameters indicate a co- 
ordinate system of the panorama picture using the co- 
ordinates of the representative points in the respective 
picture and a conversion coefficient from the coordinate 
system of the respective frames to the coordinate sys- 
tem of the panorama frame. Though the location of the 
origin of the coordinate system may freely set, it is as- 
sumed in this embodiment that the origin is set to the 
bottom left corner of the frame. The width and the length 
of the frames forming the panorama picture are constant 
and known. Panorama flag 3601 shows whether or not 
the coordinate system of the panorama picture is ap- 
plied. If the flag is 0, the coordinate system of the pan- 
orama picture is not used (the bottom left comer of each 
picture is the origin). If the flag is 1 , the coordinate sys- 
tem of the panorama picture is used (the coordinate of 
each picture is converted into that of the panorama pic- 
ture). Model type M 3602 shows a conversion from the 
each frame to the panorama picture. The flag indicates 
no conversion if it is 0, translation if it is 2, rotation/scal- 
ing if it is 4, affine conversion if it is 6, perspective con- 
version if it is 8, and quadratic conversion if it is 12. The 
number of parameters of each model equals to the 
number of model types M. 
[0212] Translational model: 

v x (x, y) = a 1 

v y (x, y) = a 2 
[0213] Rotation/scaling model: 

v x (x. y) = a 1 + a 3 x + a 4 y 

v y (x, y) = a 2 - a 4 x + a 3 y 
[0214] Affine model: 

v x (x, y) = a, + a 3 x + a 4 y 

v y (x, y) = eu, + a 5 x + a 6 y 
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[0215] Perspective model: 

v x (x, y) = (a 1 + a 3 x +a 4 y) / (1 + a 7 x +a 8 y) 

5 

v y (x, y) = (a 2 + a 5 x +a 6 y) / (1 + a 7 x +a 8 y) 
[021 6] Quadratic model: 

w 

v x (x, y) = a 1 + a 3 x + a 4 y + a 7 xy + a 9 x 2 + a 10 y 2 

15 v y (x, y) = a 2 + a 5 x + a«y + a 8 xy + a 1 1 x 2 + a 12 y 2 

[0217] The origin for conversion is defined as X-coor- 
dinate 3603 arid Y^coordinate 3604 which are represent- 
ed by th ^coordinate system of, the respective pictures. 

20 The provision of origin fo^convereion:makes7the" con- 
version! error small." Number of conversion*. parameters 
N 3605 equals to the number of frames/in the panorama 
picture. Frame interval time period 3606 is counted from 
the first frame. Set of parameters 3607 describes the M 

25 number of parameters depending* on ,,the model type. 
The trajectory of the object of each frame is described 
Using this set of parameters,. 

[0218] When shooting is done, while the camera is fol- 
lowing the object region, a panorama picture is formed 

30 by mosaicking, whereby consecutive frames are image- 
transformed and then tied together. Describing the ob- 
ject region information on the formed image makes it 
possible to describe the object region information 
uniquely in a coordinate system with a certain point on 

35 the mosaicking image as a cardinal point, even when 
the camera is moving. 

[0219] The second, third, fourth, and fifth embodi- 
ments are described in connection with the first embod- 
iment in which the object region data is described using 

40 the differential vector of the representative points of the 
approximate figure. However, these embodiments for 
adding the depth information, display flag, passing 
range information, and panorama conversion parame- 
ters for mosaicking can be freely applied to any type of 

45 object region data. The following description will be fo- 
cused on the variation of the object region data. Though 
the embodiments related to the combination of the depth 
information and object region data of other types will be 
described, it will be understood that the display flag, 

so passing range information, and panorama conversion 
parameters for mosaicking can be applied to the object 
region data of the other types. 

Sixth embodiment 

55 

[0220] In the sixth embodiment, the depth information 
is added to the object region data which is described, 
using the trajectory of the coordinates of the represent- 
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ative points of the approximate figure. 
[0221 ] The configuration of the object region data cre- 
ating apparatus of the sixth embodiment is the same as 
that of the first embodiment shown in FIG. 1. Though 
the object region is approximated using the polygon in 
the first embodiment; the object region is approximated 
using an ellipse in the sixth embodiment, as shown in 
FIGS. 39A to 39D. FIGS. 39A to 39D correspond to 
FIGS. 3 A to 3C of the first embodiment. The region is 
approximated with an ellipse by extracting two focal 
points v1 and v2 of the ellipse and one point v3 on the 
ellipse and the representative point trajectory curve is 
approximated with a spline function. 
[0222] FIG. 40 shows an example of the method of 
obtaining an approximate ellipse when the object region 
is expressed by a parallelogram. Points A, B, C and D 
shown in FIG. 40 are vertices of the parallelogram which 
is the object region. Calculations are performed so that 
which side AB or side BC is a longer side is determined. 
Then, a smallest rectangle having portions of Its sides 
which are the longer side and its opposite side is deter- 
mined. In the case shown in FIG. 40, a rectangle having 
four points A, B\ C and D' is the smallest rectangle. The 
approximate ellipse is a circumscribing ellipse similar to 
the ellipse inscribing the rectangle and passing the 
points A, B\ C and D\ 

[0223] Referring to FIG. 39 B, reference numerals v1 , 
v2 t and v3 represent representative points of a figure 
expressing an ellipse. Specifically, the representative 
points v1 and v2 are two focal points of the ellipse and 
one point v3 on the same (one point on the minor axis 
in the case shown in FIG. 39B). The focal points of the 
ellipse can easily be determined from points on the two 
axes or a circumscribing rectangle of the ellipse. An ex- 
ample will now be described with which focal points F 
and G are determined from two points P 0 and P 1 on the 
major axis and point H on the minor axis shown FIG. 41 . 
[0224] Initially, a and b which are parameters of the 
major axis and the minor axis, center C of the ellipse 
and eccentricity e are determined as follows: 

E(P 0 ,P 1 ) = 2xa 
C = (P 0 + P 1 )/2 



E (C, H) = b 

e = (1/a) x V(a x a - b x b) 

where E (P, Q) is the Euclidean distance between the 
point P and the point Q. In accordance with the deter- 
mined parameters, the focal points F and G can be de- 
termined as follows: 



F = C + ex(P 0 -C) 

5 ' G = C-ex(P 0 -C) 

[0225] Thus, the representative points F, G and H of 
the ellipse are determined. When the foregoing points 
are made to correspond to the representative points of 
10 the ellipse extracted in another frame, ambiguity is in- 
volved. That is, two combinations exist which make the 
two extracted focal points correspond to the two focal 
points in the previous frame. Since two interdevices ex- 
ist between the minor axis and the ellipse, the intege- 
rs vice corresponding to the one point on the ellipse ex- 
tracted in the previous frame cannot be determined. A 
method of determining the combination and the interde- 
vice will now be described. 

[0226] An assumption is made that the two focal 
20 points extracted in the previous frame are Fp and Gp. 
To determine F or G which correspond to Fp, the follow- 
ing comparison is made: 

25 E ((Gp - Fp)/2, (G - F)/2) 

and 

30 E ((Gp - Fpy2, (F - G)/2) 

[0227] When the former focal point is smaller, Fp is 
made to correspond to F, and Gp is made to correspond 
to G. When the latter focal point is smaller, Fp is made 
35 to correspond to G and, Gp is made to correspond to F. 
[0228] An assumption is made that the interdevices 
between the minor axis and the ellipse in the previous 
frame are Hp and the interdevices between the minor 
a axis of the ellipse in the present frame are H and H\ The 
40 point H or H' which must be made to correspond to Hp 
is determined by calculating two distances: 

E (Hp - (Gp + Fp)/2, H - (F + G)/2) 

45 

and 

E (Hp - (Gp + Fp)/2, H' « (F + G)/2) 

50 

[0229] When the former distance is shorter, H is se- 
lected. In a negative case, H' is selected. Note that the 
Interdevice H between the minor axis and the ellipse in 
the first frame may be either of the two interdevices. 
55 [0230] The foregoing process for extracting the rep- 
resentative points from the ellipse is performed by the 
representative point extracting device 1 03. 
[0231] The representative points extracted by the 
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foregoing process are usually varied in the position 
among the successive frames owing to movement of the 
object of interest in the video or shaking of the image 
pick-up camera. Therefore, the corresponding repre- 
sentative points of the ellipses are time-sequentially ar- 
ranged to perform approximation with a spline function 
for each of the X and Y axes. In this embodiment, each 
of the three points F, G and H (see FIG. 41) which are 
the representative points of the ellipse requires a spline 
function for the X- and Y-coordinates. Therefore, six 
spline functions are produced. 
[0232] The approximation to a curve with spline func- 
tions is performed by the representative point trajectory 
function approximating device 104. 
[0233] The process which is performed by the repre- 
sentative point trajectory function approximating device 
1 04 may be carried out whenever the coordinates of the 
representative points of each frame relating to the object 
region are obtained. For example, the approximation is 
performed whenever the coordinates of the represent- 
ative points in each frame are obtained. Moreover, an 
approximation error is obtained to arbitrarily divide the 
approximation section in such a manner that the approx- 
imation error satisfies a predetermined range. Another 
method may be employed with which the process is per- 
formed after the coordinates of the representative points 
in all of the frames relating to the object region have 
been obtained. 

[0234] Reference numeral 203 shown in FIG. 39C 
represents the approximated spline function expressed 
three-dimensional ly. Reference numeral 204 shown in 
FIG. 39D represents an example of the spline function 
which is the output of the representative point trajectory 
function approximating device 1 04 (only one axis of co- 
ordinate of one representative point is shown). In this 
example, the approximation section is divided into two 
sections (the number of knots is three) which are t = 0 
to 5 andt = 5 to 16. 

[0235] The thus-obtained spline functions are record- 
ed in the region data storage device 106 in a predeter- 
mined data format. 

[0236] As described above, this embodiment enables 
the object region in a video to be described as the pa- 
rameter of a curve approximating a time-sequential tra- 
jectory (a trajectory of the coordinates of the represent- 
ative points having the variable are the frame numbers 
or the time stamps) of the representative points of the 
approximate figure of the object region. 
[0237] The object region in a video can be expressed 
by only the parameters of the approximate function. 
Therefore, object region data, the quantity of which is 
small and which can easily be handled, can be created. 
Also extraction of representative points from the approx- 
imate figure and determination of parameters of the ap- 
proximate curve can easily be performed. Moreover, re- 
production of an approximate figure from the parame- 
ters of the approximate curve can easily be performed. 
[0238] A method may be employed with wh ich a basic 
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figure, for example, one or more ellipses are employed 
as the approximate figures and each ellipse is repre- 
sented by two focal points and another point. In the fore- 
going case, whether or not arbitrary coordinates speci- 

5 fied by a user exist in the region (the approximate figure) 
of the object (whether or not the object region has been 
specified) can be determined by a simple determinant 
Thus, specification of a moving object in a video can fur- 
thermore easily be performed by the user. 

w [0239] The data format of object region data which is 
stored in the region data storage device 1 06 will now be 
described. A case will now be described in which the 
representative points are approximated with a spline 
function. The representative points are similarly approx- 

*5 imated with another function. 

[0240] FIG. 42 shows an example of the data format 
of object region data for describing the spline function 
indicating the object region in a video and information 
related to the object. 

20 [0241] ID number 400B is an identification number 
which is given to each object. Note that ID number 400B 
may be omitted. 

[0242] A leading end frame number 401 B and a trail- 
ing end frame number 402B are leading and trailing end 

25 frame numbers for defining existence of the object hav- 
ing the ID number 400B. Specifically, the numbers 401 B 
and 402B are the frame number at which the object ap- 
pears in the video and the frame number at which the 
object disappears. The frame numbers are not required 

30 to be the frame numbers at which the object actually ap- 
pears and disappears in the video. For example, an ar- 
bitrary frame number after the appearance of the object 
in the video may be the leading end frame number. An 
arbitrary frame number which follows the leading frame 

35 number and which precedes the frame of disappear- 
ance of the object in the video may be the trailing end 
frame number. The leading/trailing end time stamp may 
be substituted for the lading/trailing end frame number. 
The object existence frame number or object existence 

40 time may be substituted for the trailing end frame 
number 402B. 

[0243] A pointer (hereinafter called a "related informa- 
tion pointer") 403Bfor related information is the address 
or the like of the data region in which data of information 

45 related to the object having the foregoing ID number. 
When the related information pointer 403B is used, re- 
trieval and display of information related to the object 
can easily be performed. The related information pointer 
403B may be pointer for pointing data of description of 

50 a program or the operation of a computer. In the forego- 
ing case, when the object has been specified by a user, 
the computer performs a predetermined operation. 
[0244] Note that the related information pointer 403B 
may be omitted when the objects are not required to per- 

55 form individual operations. 

[0245] tt is not necessary to have the related informa- 
tion pointer 403B. As an alternative to using the pointer 
403B, related information itself may be described in the 
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object region data. Further, it is possible to have the re- 
lated information pointer 403B or the related information 
itself in the object region data. In this case, a flag is re- 
quired to indicate whether the related information point- 
er 403B or related information has been described In 5 
object region data. 

[0246] The number of approximate figures 404B is the 
number of the figures approximating the object region. 
In the example shown in FIG. 39A in which the object 
region is approximated with one ellipse, the number of « 
the figures is 1. 

[0247] Approximate figure data 405B is data (for ex- 
ample, the parameter of a spline function) of a trajectory 
of the representative point of the figure for expressing 
an approximate figure. 15 
[0246] Note that approximate figure data 405B exists 
by the number corresponding to the number of approx- 
imate figures 404B (a case where the approximate fig- 
ure number 404B is two or larger will be described later). 
[0249] The number of the approximate figures 404B 20 
for object region data may always be one (therefore, al- 
so approximate figure data 405 is always one) to omit 
the field for the approximate figure number 404B. 
[0250] FIG. 43 shows the structure of approximate fig- 
ure data 405B (see FIG. 42). 25 
[0251 ] A figure type ID 1 300B is identification data for 
indicating the type of the approximate figure, the figure 
type ID 1 300B identifying a circle, an ellipse, a rectangle, 
and a polygon. 

[0252] The number of representative points 1 301 B in- 30 
dicatesthe number of representative points of the figure 
specified by the figure type ID 1300B. Note that the 
number of the representative points is expressed with 
M. 

[0253] A set of representative point trajectory data 35 
items 1302B, 1303B, and 1304B are data regions relat- 
ing to the spline function for expressing the trajectory of 
the representative points of the figure. The representa- 
tive points of one figure require data of one set of spline 
functions for the X-, Y-, and Z-coordinates. Therefore, <o 
data of the trajectory of the representative points for 
specifying the spline function exists by representative 
point number (M) x 3. 

[0254] Z-coordinate of the representative point can be 
obtained by using methods shown in FIGS. 1 8 to 22 or *s 
any other methods. 

[0255] Note that the type of the employed approxi- 
mate figure may previously be limited to one type, for 
example, an ellipse. In the foregoing case, the field for 
the figure type ID 1 300B shown in FIG. 42 may be omit- so 
ted. 

[0256] When the representative point number is de- 
fined according to the figure type ID 1300B, the repre- 
sentative point number may be omitted. 
[0257] FIG. 44 shows an example of the structure of 55 
representative point trajectory data 1302B, 1303B, and 
1304B. 

[0256] A knot frame number 1 4008 indicates the knot 



of the spline function. Thus, a fact that polynomial data 
1403B is effective to the knot is indicated. The number 
of coefficient data 1402B of the polynomial varies ac- 
cording to the highest order of the spline function (as- 
suming that the highest order is K, the number of coef- 
ficient data is K + 1). Therefore, reference to a polyno- 
mial order 1401B is made. Subsequent to the polyno- 
mial order 1401B, polynomial coefficients 1402Bby the 
number corresponding to the polynomial order (K + 1 ) 
follows. 

[0259] Since the spline function is expressed in an in- 
dividual polynomial between the knots, the polynomials 
are required by the number corresponding to the 
number of knots. Therefore, data 1403B including the 
knot frame number 1400B and the coefficient of the pol- 
ynomial 1402B is described repeatedly When the knot 
frame number is the same as the trailing end frame 
number, it means the last polynomial coefficient data. 
Therefore, termination of representative point trajectory 
data can be understood. 

[0260] FIG. 43 shows that the depth information is de- 
scribed for each of the representative points. However, 
it is possible to describe the depth information for each 
of the object region as shown in FIG. 17 in the second 
embodiment. FIG. 45 shows the object region data hav- 
ing one depth information for one object region data. 
The approximate figure data includes a depth informa- 
tion 1306B in addition to the figure type ID 1300B, rep- 
resentative point number 1301B, and a pair of repre- 
sentative point trajectory data 1302B and 1303B. 
[0261] A case will now be described in which a figure 
except for the ellipse is employed as the approximate 
figure. 

[0262] FIG. 46 is diagram showing the representative 
points in a case where a parallelogram is employed as 
the approximate figure. Points, A, B, C and D are verti- 
ces of the parallelogram. Since three points of the four 
vertices are determined, the residual one is determined. 
Therefore, three vertices among the four vertices are re- 
quired to serve as the representative points. In the fore- 
going example, three points, which are A, B and C, are 
employed as the representative points. 
[0263] The examples have been described with which 
one figure is assigned to one object to roughly approx- 
imate the object region. The accuracy of approximation 
may be improved by approximating one object region 
with a plurality of figures. FIG. 47 shows an example in 
which a plurality of figures approximate one object re- 
gion. In the foregoing case, a region of a person in the 
image is expressed with 6 ellipses 600B to 605B. 
[0264] When one object is approximated with plural 
figures as shown in FIG. 47, a process for dividing the 
object region into a plurality of regions must be per- 
formed. The process may be performed by an arbitrary 
method. For example, a method with which the object 
is directly divided with manpower may be employed. In 
the foregoing case, a pointing device, such as a mouse, 
is used to, on the image, enclose the region with a rec- 
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tangle or an ellipse. Alternatively, the region is specified 
with a trajectory of the pointing device. When an auto- 
matic method is employed as a substitute for the man- 
power, a method may be employed with which clustering 
of movement of the object is performed to realize the 
division. The foregoing method is a method with which 
the movement of each region in the object among the 
successive frames is determined by a correlation meth- 
od (refer to, for example, Image Analysis Handbook 
Chapter-3, Section II, Publish Conference of Tokyo Uni- 
versity, 1 991 ) or a method with gradient constraints (re- 
fer to, for example, Determining optical flow, B. K. P. 
Horn and B. G. Schunck, Artificial Intelligence, Vol. 17, 
pp. 185-203, 1981) to collect similar movements to form 
a region. 

[0265] Each of the divided regions is subjected to the 
above process so that data of the approximate figure is 
created. In the foregoing case, the spline function .which 
must be described in object region data of one object 
increases as the number of the approximate figures in- 
creases. Therefore, the structure of data is formed 
which includes approximate figure data 405B by the 
number (L in the foregoing case) corresponding to the 
approximate figure number 404B, as shown in FIG. 48. 
[0266] As described above, the field for the approxi- 
mate figure number 404B may be omitted by making the 
approximate figure number to always be one (therefore, 
data of the approximate figure is made to always be one) 
to the object region data. In the foregoing case, one ob- 
ject can be expressed with a plurality of figures when 
object region data is produced for each figure approxi- 
mating one object (the same ID number is given). 
[0267] When one object is approximated with a plu- 
rality of figures in this embodiment, the same figure is 
employed. A mixture of a plurality types of figures may 
be employed to approximate the object region. 
[0268] Although the method of approximation using 
the ellipse has been described, an approximation meth- 
od using a rectangle will now be described as another 
approximation method. 

[0269] FIGS. 49A, 49B, and 49C are diagrams formed 
into the same shape as that of FIGS. 39A, 39B, and 39C. 
in the foregoing case, the region figure approximating 
device 1 02 employs a method of approximating a region 
with a rectangle. The representative point extracting de- 
vice 103 employs a method of extracting the four verti- 
ces of the rectangle. The representative point trajectory 
function approximating device 1 04 employs an approx- 
imation method using a spline function. 
[0270] Referring to FiG. 49A, reference numeral 
2800B represents video data for one frame which is to 
be processed. 

[0271] Reference numeral 2801 B represents an ob- 
ject region which is to be extracted. A process for ex- 
tracting the region 2801 B of the object is performed by 
the region extracting device 101. 
[0272] Reference numeral 2802B represents a result 
of approximation of the object region with the rectangle. 



The process for obtaining the rectangle 2802B from the 
object region 2801 B is performed by the region figure 
approximating device 1 02. 

[0273] An example of the process for obtaining the 
5 rectangle 2802B shown in FIG. 49A is shown in FIG. 50. 
That is, a mask image of the frame 2800B is raster- 
scanned (step S60B). When the subject pixel is included 
in the object region (step S61), the minimum value is 
updated if each of the X- and Y-coordinates is smaller 
io than the stored minimum value. If the values are larger 
than the maximum value, the maximum value is updated 
(step S62B). 

[0274] The foregoing process is repeated and 
checked for all of the pixels so that the minimum and 

is maximum values of the pixel position indicating the ob- 
ject region 2B01 B for each of the X- and Y-coordinates 
are obtained. Thus, the coordinates of the four vertices 
of the rectangle 2802B can be obtained. 
[0275] Although the above-mentioned approximating 

20 method using the rectangle is excellent in easiness of 
the process, it is sometimes desirable to approximate 
the object region with the ellipse. FIG. 51 shows that an 
approximate ellipse is obtained from the rectangle rep- 
resenting the object region. FIG. 52 shows the process 

25 of obtaining the approximate ellipse. 

[0276] Referring to FIG. 51 , it is assumed that an ob- 
ject region 3300B and a circumscribing rectangle 3301 B 
have been obtained. 

[0277] Initially, the inscribing ellipse and the circum- 
30 scribing ellipse of the approximate rectangle 3301 B are 
obtained (step S80B). 

[0278] Referring to FIG. 51 , an ellipse 3302B is an in- 
scribing ellipse of the rectangle 3301 B and the ellipse 
3303B is an circumscribing ellipse of the rectangle 
35 3301 B. 

[0279] Then, the size of the inscribing ellipse 3302B 
is gradually brought closer to that of the circumscribing 
ellipse 3303B (step S81B). Then, an ellipse 3304B for 
completely including the object region 3300 B is ob- 

40 tained (step SB2B) to employ the ellipse 3304B as the 
approximate ellipse. The unit for enlarging the size of 
the inscribing ellipse 3302B in each process of the re- 
- peated process may previously be determined. The unit 
may be determined in accordance with the difference 

45 between the size of the inscribing ellipse 3302B and that 
of the circumscribing ellipse 3303B. 
[0280] A reverse method may be employed with 
which the size of the circumscribing ellipse 3303B is 
brought closer to the size of the inscribing ellipse 3302B. 

so in the foregoing case, the circumscribing ellipse 3303B 
includes the object region 3300B from the first. There- 
fore, the ellipse previous to the ellipse with which the 
portion which is not included in the object region 3300B 
has first occurred in the repeated process is required to 

55 be the approximate ellipse 3304B. 

[0281] An example will now be described in which 
when a trajectory of the object region is described by 
the method According to embodiments of the present 
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invention, the structure of data which is different from 
the approximate data structure shown in FIGS. 42 and 
43 is employed. 

[0282] FIGS. 52 and 53 show another example of a 
description format for data of the approximate figure and 
data of trajectories of representative points of the object 
region. Note that FIGS. 52 and 53 shows only one rep- 
resentative point for a section (section from knot number 
N 3902B to a function specifying information arrange- 
ment 3923B) of data of the trajectory of the representa- 
tive point (in actual, a plurality of representative points 
are described to correspond to the number of the rep- 
resentative points). 

[0283] Description will now be made on the assump- 
tion that the highest order of the polynomial is the sec- 
ond order. 

[0284] In the foregoing example (shown in FIGS. 41 , 
42, and 43), all of the coefficients of the polynomial 
spline function are described. The description method 
in this example is arranged to use combination of the 
coordinate of the knot of the spline function and a value 
relating to the second-order coefficient of the spline 
function. The foregoing description method has an ad- 
vantage that the knot can easily be extracted to cause 
the trajectory of a large object to easily be detected. 
[0285] The foregoing description method will now be 
described. 

[0286] The figure type ID 3900B shown in FIG. 53 
specifies the type of the figure which has been used to 
approximate the shape of an object. For example, only 
the centroid, the rectangle, the ellipse or their combina- 
tion can be specified. The number of representative 
points 3901 B indicates the number of the trajectories of 
the representative points which are determined in ac- 
cordance with the type of the figure. 
[0287] The knot number N 3902B indicates the 
number of knots of a spline function expressing the tra- 
jectory of the representative point. The frame corre- 
sponding to each knot is expressed as time so as to be 
stored in knot time (1) to knot time (N) 3903B. Since a 
predetermined number of knot times have been provid- 
ed, the knot times are described as knot time arrange- 
ment 3904B. 

[0288] Also X-, Y-, and Z-coordinates of each knot are 
described as arrangements 3906B, 3908B, and 391 0B 
of X-coordinates of knots 3905B, Y-coordinates of knots 
3907B, and Z-coordinates of knots 3909B. 
[0289] A linear function flag 39 1 0B indicates whether 
or not only a linear function is employed as the spline 
function between knots. If second or higher order poly- 
nomial is partially employed, the linear function flag 
391 0B is turned off. Since the linear function flag 391 0B 
is employed, description of function specifying informa- 
tion 391 2B, 391 6B, and 3920B to be described later can 
be omitted when only the linear function is employed as 
the approximate function. Therefore, an advantage can 
be realized in that the quantity of data can be reduced. 
Note that the flag 391 0B may be omitted. 



[0290] Function IDs 3913B, 391 7B, and 3921 B and 
function parameters 391 4B, 391 8B, and 3922B con- 
tained in function specifying information 391 2B, 391 6B, 
and 3920B indicate the order of the polynomial spline 

5 function and information for specifying the coefficient of 
the polynomial spline function, respectively. 
[0291] The number of function parameters 39148, 
391 8B, and 3922B for X-, Y-, and Z-coordinates are 
(knot number - 1) so that they are described as the ar- 

10 rangements 391 5B, 391 9B, and 3923B. 

[0292] Although the description has been made that 
the highest order of the polynomial is the quadratic or- 
der, the highest order of the polynomial may, of course, 
be a cubic or higher order. 

15 [0293] FIGS. 52 and 53 show that the depth informa- 
tion is described for each of the representative points. 
However, it is possible to describe the depth information 
for each of the object region as shown in FIG. 17 in the 
second embodiment. FIGS. 54 and 55 show the object 

20 region data having one depth information for one object 
region data. 

[0294] FIG. 55 shows the object region data having 
the figure type ID 700B, object appearing time 701 B, ob- 
ject existing time period 702B, number of representative 

25 points M 703B, representative point trajectory 704B, 
and depth information 7058, in the same manner as FIG. 
8. FIG. 56 shows the representative point trajectory 
704B which is obtained by excluding the figure type ID 
39008, representative point number 3901 B, arrange- 

30 ment of knot Z 391 0B, and arrangement of function 
specifying information Z 3923B. 
[0295] Though the sixth embodiment adds the depth 
information to the object region which is described using 
the trajectory of the coordinates of the representative 

35 points of the approximate figure, it is possible to add the 
display flag, passing range information, and panorama 
* conversion parameters for mosaicking to the above de- 
scribed object region data. Seventh embodiment 
[0296] In the seventh embodiment, the depth informa- 

40 tion is added to another object region data. The object 
region data in an arbitrary frame of the seventh embod- 
iment is described by a reference object region data in 
a reference frame and a conversion parameter indicat- 
ing the conversion from the reference object region to 

45 an object region in the arbitrary frame. 

[0297] The configuration of the object region data cre- 
ating apparatus of the seventh embodiment is shown in 
FIG. 57. The object region data creating device com- 
prises a video data storage device 2C, object region 

50 processing device 4C, conversion parameter calcula- 
tion device 6C, function approximation device 8C, and 
object region data storage device 10C. 
[0298] The video data storage device 2C stores video 
data. The device 2C is constituted by, for example, a 

55 hard disk device, an optical disk device or a semicon- 
ductor memory. It is noted that the video data storage 
device 2C is not necessarily located at the same site as 
that of the other devices and may be located remotely 
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through the Internet or the like. 
[0299] The object region processing device AC exe- 
cutes a processing for obtaining an object region in a 
frame serving as a reference (reference object region) 
and an object region in a frame serving as a target (tar- 
get object region). 

[0300] The conversion parameter calculation device 
6C executes a processing for calculating the conversion 
parameters of the target object region based on the ref- 
erence object region. 

[0301] The function approximation device 8CC exe- 
cutes a processing for approximating a time series tra- 
jectory by a temporal function for each conversion pa- 
rameter of the object region. As will be described later, 
if the conversion parameters themselves are described, 
this function approximation device 8C is not necessary. 
[0302] The object region data storage device 10C 
stores object region data including data for expressing 
a functional formula approximating the time series tra- 
jectory for each conversion parameter. 
[0303] Although it is preferable that the reference ob- 
ject region is updated, a device relating to the update 
processing is not shown in FIG. 57. 
[0304] The video data storage device 2C and the ob- 
ject region data storage device 10C may be constituted 
by individual storage devices or media. Alternatively, all 
of or part of these devices may be constituted by a com- 
mon storage device or medium. 
[0305] This object reg ion data creating apparatus can 
be also realized by executing a software on a computer. 
[0306] FIGv : 5^ 
pr^dui^^ 

accordingto'tlfi^^bbdiment. 
[0307] First, in step S1 01 C, object regions in all frame 
in a video are inputted (while assuming that object re- 
gions are known). If the object regions are manually in- 
put through GUI, the contour of an object serving as a 
processing target in the video Is specified by a pointing 
device such as a mouse or a touch panel. The interior 
of the contour of the object inputted manually may be 
set as an object region. Afternatively* after fitting an -in/ 
putted contourto the contour 1ine t 6f>the object in an im- 
age by means ofca^techhiqutf using; aTbYnamic^outfine * 
model referred to as Snakes (see, for example, M. Kass, 
A. Witkin and D. Terzopolus, "Snakes: Active contour 
models 11 , Processings of the 1st International Confer- 
ence on Computer Vision, pp. 259-268, 1987), the inte- 
rior of the contour thus fitted may be set as an object 
region. lnstead:Df jiianually^ii^^ 

image^p^ceissing:' If data relating to the object regions 
are already present, it is possible to input such data. 
[0308] At least one of these object regions is regis- 0 
tered as a reference object region. To registerthe object* 
region, there is proposed a method including generating \ ss 
and storing a bi nary bit map'on. wh ich ^corresponds • 
to the interior of each object -region arid M 6 B corresponds 
to the outside of the region. 
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[0309] Further, a frame including the reference object 

region is registered as a reference frame. 

[0310] Next, in step S102C, a conversion parameter 

for converting the reference object region into an object 

region in one frame serving as a processing target (to 

be referred to as "target object region" hereinafter) is 

calculated. 



[0311] i Hflisproce ssinq can be. realized WB&mbina- 
tij^offSoFexal^te^^ 

ticaJiJowin^e^argetObject regioJl^d^^roce^ng^for 
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pjxejs yy d 4h e;obj ecfrreg i on from the ; referencetframeito 
avpresentirame. 



[0312] FIG. 59 shows the schematic of a processing 
example for obtaining an optical flow in the object region 
in each frame. 

[0313] In FIG. 59, reference symbol 2fflE5£s£S6XSfi& 
reference frame, 202C denotes the next frame to the 
reference frame, and 203C denotes the next frame to 
the frame 202C. Reference symbols 204C, 205C and 
206C denote object regions in the respective frames. 
Reference symbol 207C denotes the optical flow of the 
object region from the frame 201 C to the frame 202C. 
Reference symbol 208C denotes the optical flow of the 
object region from the frame 201 C to the frame 203C. 
[0314] As can be seen, the optical flow obtaining 
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shows the schematic of a processing example in the lat- 
ter case. 

[0316] In FIG. 60, reference symbol 301 C denotes a 
reference frame, 302C denotes the next frame to the 
reference frame, and 303C denotes the next frame to 
the frame 302C. Reference symbols 304C, 305C and 
306C denote object regions in the respective frames. 
Reference symbol 307C denotes the optical flow of the 
object region from the frames 301 C to 302C. Reference 
symbol 308C denotes the optical flow of the object re- 
gion from the frame 302C to 303C. 
[0317] If calculating optical flows as shown in FIG. 60, 
parameter variations becomes smaller than those in the 
method of FIG. 59. However, the calculation of the ob- 
ject region in an arbitrary frame is more complex than 
the method of FIG. 59. While either the method shown 
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in FIG. 59 or that shown in FIG. 60 may be employed, 
description will be continued while assuming that the op- 
tical flow is calculated by the method shown in FIG. 59 
in this embodiment. 

[0318] Many methods for obtaining an optical flow 
have been already proposed (see, for example, J. L. 
Barron, D.J. Fleet and S. S. Beauchemin, "Performance 
of Optical Flow Techniques", International Journal of 
Computer Vision, vol. 12, no. 1, pp. 43-77, 1994). It is 
possible to adopt any method to obtain an optica! flow. 
[0319] It is also possible to select a plurality of char- 
acteristic points in the reference object region and to use 
a moving; vector obtained by template matching 'With 
blocks reentered- arpund vthe: characteristic points used 
Bs^template.^ 

[0320] Next, a processing for calculating a conversion 
parameter from an optical flow is executed. It is noted 
that a conversion parameter to be obtained varies ac- 
cording to conversion models which the parameters are 
based on. 

[0321] In this embodiment, the following models can 
be selected: 

"Enlargement/reduction model" and "Rotation mod- 
el" as models when the number of parameters is 1 ; 
"Parallel translation model" as a model when the 
number of parameters is 2; 
"Composite model of enlargement & reduction/ ro- 
tation/parallel translation models" (to be referred 
herein as "4-parameter conversion model") as a 
model when the number of parameters is 4; 
"Affine conversion model" as a model when the 
number of parameters is 6; 
"Projection conversion model",as a model when the 
. number of parameters is 8; and 
"Parabolic conversion model" as a model when the 
number of parameters is 12. 

*'* 

[0322] The respective models are expressed by the 
following mathematical formulas (1) to (7): 



x — a o*» 

y' = a 0 y 



x* =xcosa 0 - ysina 0 , 
y f = xcosa 0 + ysina 0 



0) 



(2) 



x' = x + a 



0' 



y =y + a 1 



(3) 
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x' = a 0 x + ai y + a 2 , 

y' = a 1 x-a 0 y + a 3 (4) 



x' = a 0 x + aiy + a 2t 



y' = a 3 x + a 4 y + a 5 (5) 



x* = (a 0 x + + a 2 ) / (a 3 x + a 4 y + 1), 

y' = (a 5 x + a ey + a ?) 1 ( a 3 x + a 4y + 1 ) ( 6 ) 



x* = 80x2 + 8^ + a 2 y +a 3 x + a 4 y + a 5 , 

y' = a 6 x 2 + a 7 xy + a 8 y 2 + a^x + a 10 y + a u (7) 



[0323] The mathematical formula (1 ) corresponds to 
the enlargement and reduction model, the mathematical 
formula (2) corresponds to the rotation model, the math- 
ematical formula (3) corresponds to the parallel transla- 
tion model, the mathematical formula (4) corresponds 
to the 4-parameter conversion model, the mathematical 
formula (5) corresponds to the Affine conversion model, 
the mathematical formula (6) corresponds to the projec- 
tion conversion model, and the mathematical formula 
(7) corresponds to the parabolic conversion model. In 
the formulas, (x, y) denotes coordinates in the reference, 
object region, and (x\ y') denotes the coordinates of the 
corresponding point of the object in the target object re- 
gion. In the respective conversion models, it is assumed 
that the relationship between corresponding points in 
the two frames can be expressed using parameters a 0 
to a u as shown in the formulas. Needless to say, a par- 
ametric model other than the above-described models 
may be prepared. 

[0324] To calculate the conversion parameter, a meth- 
od of least squares can be employed. This method is to 
determine the conversion parameter so that the sum of 
the squares of an error generated when a combination 
of (x, y) and (x\ y 1 ) obtained by optical flow are substi- 
tuted into the conversion model mathematical formula 
may become a minimum. This is an old, conventional 
method andean be easily executed by matrix operation. 
[0325] Next, in step S1 03C, the calculated conversion 
parameter of the object region is converted to (approx- 
imated by) a temporal function. 
[0326] Namely, "n" number of conversion parameters 
a^ (0 s j ^ n-1) (e.g., n = 12) in a certain time interval 
are expressed by: 

a j = fi(t), 
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where fj(t) is the function of time t. 
[0327] The time interval here is one including the 
frames for which an optical flow is calculated using the 
same reference object region, f j(t) may be a polynomial, 
a Spline function, a constant or the like. 
[0328] FIG. 61 shows a state in which a certain con- 
version parameter aj calculated from the optical flow is 
expressed by a function. In FIG. 61 , reference symbol 
401 C denotes a time interval in which a function is to be 
obtained, 402 C denotes a value of a, calculated from the 
optical flow, and 403C denotes a function a, = f,(t) ex- 
pressing the parameter aj. 

[0329] The advantage of expressing parameter a, by 
a function is that the quantity of data for describing ob- 
ject regions can be reduced. If a polynomial of second 
degree or lower is used as a function, for example, three 
real numbers suffice to describe all parameter values in 
a certain time interval since this function can be ex- 
pressed by three real numbers. 
[0330] If a polynomial or a spline function is used as 
a function expressing the conversion parameter, the 
conversion parameter aj is determined so that the error 
between the values of a, in the conversion target time 
interval and the values calculated by the function fi(t) 
may become small. By using, for example, the method 
of least squares, the parameter can be easily calculated. 
[0331] This processing for obtaining an approximate 
function may be executed every time parameter values 
relating to the object region in each frame are obtained 
(e.g., a method of executing approximation and obtain- 
ing an approximate error every time parameter values 
in each frame are obtained, and appropriately dividing 
an approximate interval so that the approximate error 
may fall within a certain range). Alternatively, this 
processing may be executed simultaneously for all 
frames after the reference object region is updated and 
a reference frame interval is decided. 
[0332] The processing procedure of step S103C will 
be described in detail later. 

[0333] Next, in step S1 04C, it is determined whether 
or not it is necessary to update the reference object re- 
gion. 

[0334] In this embodiment, an object region in an ar- 
bitrary frame is expressed by the reference object region 
in the reference frame and the conversion parameter of 
the reference object region. However, an object region 
to be expressed differ too greatly in shape from the ref- 
erence object region, a shape similarto the object region 
to be expressed cannot be obtained even by moving/ 
deforming the reference object region by the conversion 
parameter. In that case, it is effective to change the ref- 
erence object region to an object region in another frame 
(to update the reference object region). In this embodi- 
ment, therefore, it is determined whether or not such a 
change is necessary in step S104C. 
[0335] To make this determination, it is possible to 
employ a method of determining whether or not the error 
between an actual object region in a certain frame and 



a predicted object region exceeds a preset threshold 
value. The predicted object region means an object re- 
gion in a certain frame which is calculated from the ref- 
erence object region by using the conversion parameter. 

5 The conversion parameter used for conversion is a val- 
ue calculated from the temporal function a l = f ( (t). As the 
error between the actual object region and the predicted 
object region, a ratio of the area of a common portion to 
both regions to the area of a part which is not common 

10 can be used. 

[0336] Next, in step S1 05C, if it is determined at step 
S1 04C that it is necessary to update the reference ob- 
ject region, a reference object region update processing 
is executed. This processing is basically the same as 

15 the processing executed in step S1 01 C. That is to say, 
in the processing in step S105C, the processing target 
frame for which the conversion parameter is calculated 
in step S102C is registered as a reference frame, and 
a binary bit map expressing the reference object region 

20 are generated. Further, an object region in the reference 
frame is registered as a reference object region. 
. [0337] In step S1 06C, it is determined whether or not 
a processing for describing the object regions in the vid- 
eo is ended. This determination is based on, for exam- 

25 pie, whether or not a current object region is at the final 
frame of the video, whetheror not acurrent object region 
is at the final frame of an object region existing time- 
interval, whether or not a user indicates the end of the 
description processing or the like. The processings from 

30 steps S102C to S104C or S105C are repeatedly exe- 
cuted for each frame until it is determined that the de- 
scription processing is ended in step S106C. 
[0338] In step S107C, information on the description 
of the object region (parameter of the function approxi- 

35 mating conversion parameter) calculated by the preced- 
ing processings is recorded according to a predeter- 
mined description format. The information is recorded 
by the object region data storage device 10C such as, 
for example, a semiconductor memory inside or outside 

40 of a computer, a magnetic tape, a magnetic disk or an 
optical disk. 

[0339] FIG. 62 shows one example of an object region 
description format with respect to one object region in 
this embodiment. 
45 [0340] In FIG. 62, reference symbol 501 C denotes an 
object ID which is identification information (e.g., 
number or symbol) allotted to and peculiar to an object. 
[0341] Reference symbol 502C denotes the number 
of constant reference object region time-intervals which 
50 is the number of frame intervals having the same refer- 
ence object region (N in FIG. 62). This number N is also 
equal to the number of reference frames. 
[0342] Reference symbols 503C and 504C denote a 
start time and an end time of object region existing time- 
rs intervals, respectively. Each time is described by time 
itself or frame number. The length of the object region 
existing time-interval (a subtraction value of time or 
frame number) may be used instead of the end time. 
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[0343] Reference symbol 505C denotes object region 
description information. The object region description 
information 505C is described for each reference object 
region interval, i.e., by the number of the constant ref- 
erence object region time-intervals (N in the example of 
FIG. 62). Reference symbol 51 2C denotes a depth in- 
formation for the object which is the same as FIG. 18. 
[0344] The concrete contents of each object region 
data description information 505C are indicated by ref- 
erence symbols 506C to 51 OC shown in FIG. 62. 
[0345] The reference symbols 506C and 507C denote 
a start time and an end time of the reference object re- 
gion interval, respectively. Each time is described by a 
time itself or a frame number. The length of the reference 
object region interval can be used instead of the end 
time. 

[0346] The reference symbol 508C denotes a conver- 
sion model ID. This Is intended to specify which model, 
such as the en largement and reduction model , the aff ine 
conversion model and the parabolic conversion model, 
is used to describe the object region. 
[0347] Reference symbol 51 1C denotes the coordi- 
nates of an origin to determine where the coordinates 
of the origin of the conversion model is positioned in an 
image. The origin coordinate data can be omitted if such 
a rule as to set the position of the center of gravity of the 
reference object regions to be constantly at an origin, is 
predetermined. 

[0348] The reference symbol 509C denotes reference 
object region data to specify a reference object region. 
To be specific, the reference object region data include 
the time of the reference frame (or frame number), bit 
map data representing the reference object regions (or 
a pointer to the bit map data). It is preferable that the bit 
map data is compressed and then stored since the data 
size is large unless compressed. 
[0349] The reference symbol 51 0C denotes conver- 
sion parameter information. The conversion parameter 
information are described by the number of parameters 
(M parameters in the example of FIG. 62) set by a con- 
version model (conversion model ID). To be specific, the 
conversion parameters include an arrangement of pa- 
rameter values in each frame, information for specifying 
an approximate function of the parameters (coefficient 
values) and the like. The conversion parameter informa- 
tion will be described later in detail. 
[0350] By executing the above-described process- 
ings, the object regions changing spatially and/or tem- 
porally in the video can be recorded, as simple descrip- 
tion data. 

[0351] In the above description, the object region is 
expressed by the bit map, and the conversion parameter 
for converting the reference object region into an object 
region in the processing target frame (target object re- 
gion) is calculated. However, it is possible to approxi- 
mate an object region with an approximate figure and 
calculate a conversion parameter for converting the re- 
spective representative points of the approximate figure 



52 

of the reference object region into the corresponding 
representative points of an approximate figure of an ob- 
ject region in a processing target frame (target object 
region). 

5 [0352] FIG. 63 shows an example of the constitution 
of an object region data creating apparatus. The object 
region data creating apparatus comprises the video da- 
ta storage device 2C, object region processing device 
4C, a figure approximation device 5C, the conversion 

10 parameter processing device 7C, function approxima- 
tion device 8C, and object region data storage device 
1 0C. If processings executed by the creating apparatus 
are intervened by the operation of a user, a GUI for dis- 
playing video (moving image) data in, for example, units 

1$ of frames and for receiving the input of a user's com- 
mand and the like is employed (GUI is not shown in FIG. 
63). 

[0353] The figure approximation device 5C executes 
a processing for approximating an object region by an 
20 approximate figure and obtaining the representative 
points of the approximate figure. 
[0354] The conversion parameter calculation device 
6C calculates conversion parameters for converting the 
representative points of the approximate figure of a ref- 
25 erence object region in a reference frame serving as a 
reference into the representative points of the approxi- 
mate figure of a target object region in a target frame. 
[0355] The function approximation device 8C approx- 
imates the time series trajectory of each of the conver- 
se sion parameters for the representative points of the ap- 
proximate figure of the object region to a temporal func- 
tion. The function approximation device 8C is not nec- 
essary if the conversion parameters themselves are de- 
scribed. 

35 [0356] Needless to say, this object region data creat- 
ing apparatus can be realized by executing a software 
on a computer. 

[0357] FIG. 64 shows one example of processing pro- 
cedure for the object region data creating apparatus in 
40 this embodiment. 

[0358] A step S301C is the same as step S101C in 
FIG. 58. 

[0359] In step S302C, object regions are approximat- 
ed by preset figures throughout the interval in which ob- 

45 ject regions exist. 

[0360] In the processing for approximating the object 
region by a figure, an approximate region as small as 
possible to surround the object region is found. As figure 
used for approximation, various figures such as a rec- 

50 tangle (a square, a rectangle), a parallelogram with or 
without gradient, an ellipse (including a circle) and a pol- 
ygon with or without gradient, can be employed. In ad- 
dition, as the region approximation method, various 
methods such as a method of approximating a region 

55 by a circumscribed figure of the region, a method of ap- 
proximating a region by an inscribed figure of the region, 
a method of setting the center of gravity of the region to 
the center of gravity of an approximate figure, a method 
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of making the areas of the region and the approximate 
figure equal and a method of minimizing the area of a 
portion on which the region and an approximate figure 
do not overlap each other, may be employed. 
[0361] Instead of approximating the object region to 
a preset figure, the type of a figure can be specified by 
a user for each target object. Alternatively, the type of a 
figure can be automatically selected according to the 
shape or the like of the object for each target object. 
[0362] Furthermore, the processing for obtaining the 
approximate figure of the object region may be executed 
for each frame or executed by object regions in several 
frames before and after the target frame. In the latter 
case, the changes of the size and position of the approx- 
imate figure are smoothed among several frames, 
thereby making it possible to smooth the movement or 
deformation of the approximate figure or to make the 
extraction error of the object region inconspicuous. It is 
noted that the size of the approximate figure may vary 
according to the frame. 

[0363] If the approximate figure of the object region is 
obtained, a processing for extracting representative 
points expressing this approximate figure is executed. 
Which points are used as representative points depends 
on which type of an approximate figure is used. If the 
approximate figure is, for example, a rectangle, four or 
three vertexes may be set at representative points. If the 
approximate figure is a circle, the center and one cir- 
cumferential point or both end points of the diameter 
may be used as representative points. Further, if the ap- 
proximate figure is an ellipse, the vertexes of the circum- 
scribed rectangle of the ellipse, or two focuses and one 
point on the ellipse (e.g., one point on the short axis of 
the ellipse) may be used as representative points. If the 
approximate figure is an arbitrary closed polygon, it is 
necessary to use the respective vertexes of the polygon 
as representative points. 

[0364] The representative points are extracted in 
units of frames every time an approximate figure for one 
frame is obtained. The respective representative points 
are expressed by a horizontal coordinate x and a vertical 
coordinate y. 

[0365] A method of obtaining an approximate ellipse 
if the object region is expressed by a parallelogram is 
the same as that shown in FIG. 40. 
[0366] A method of obtaining representative points 
from the ellipse is the same as that shown in FIG. 41 . 
[0367] The approximate figure is not limited to the el- 
lipse but may be a parallelogram or a polygon. 
[0368] Next, in step S302C, a reference object region 
and a reference frame are set. The reference object re- 
gion is the approximate figure of an object region in the 
first frame (reference frame) in an object region existing 
time-interval. The positions of the representative points 
of the reference region approximate figure are stored, 
as well. 

[0369] Next, in step S303C, the representative points 
of approximate figure of the object region in a process- 



54 

ing target frame are made to correspond to those of the 
approximate figure of the reference object region. 
[0370] FIG. 65 shows one example of how to make 
the former representative points correspond to the latter 

5 representative points. In FIG. 65, reference symbol 
1 000C denotes the centers of gravity of approximate 
rectangles. In FIG. 65, the approximate figure 1 001 C of 
the reference object region and the approximate figure 
1002C of the target object region are obtained. 

10 [0371] First, either the approximate figure 1001C or 
1002C is moved in parallel, to thereby make the posi- 
tions of the centers of gravity of the both figures 1 001 C 
and 1002C coincident with each other (FIG. 65 shows 
a state in which the positions of the centers of gravity 

*5 are coincident with each other). 

[0372] Thereafter, distances d1 to d4 between the 
four vertexes of the figure 1 001 C and those of the figure 
1002C are calculated, respectively and the sums of the 
distances are obtained from all combinations of the ver- 

20 texes. 

[0373] Among them, a combination having the small- 
est sum of distances Is obtained and the representative 
points of the combination are made to correspond to one 
another. 

25 [0374] It is noted that there are cases where it is dif- 
ficult to make the representative points of the approxi- 
mate figure of the object region correspond to those of 
the approximate figure of the reference object region in 
this method. For example, if an approximate rectangle 

30 is close to a square and rotates by 45 degrees, it is dif- 
ficult to make the representative points of the approxi- 
mate figure of the object region correspond to those of 
the approximate figure of the reference object region 
(since the sum of distances is almost equal between the 

35 two combinations). In that case, therefore, a method in- 
cluding obtaining the exclusive OR of the object regions 
in the approximate figures, and adopting a combination 
having the smallest area of the figures, or a method in- 
cluding obtaining the absolute difference in texture be- 

40 tween object regions and obtaining a combination hav- 
ing a smallest difference value. 
[0375] In step S304C, conversion parameters are cal- 
culated from the moving vectors of the representative 
points of the approximate figure of the object region. 

45 [0376] In this processing, the movements of the rep- 
resentative points are used instead of an optical flow 
and conversion parameters are thereby calculated by 
the same processing as that of step S102C shown in 
FIG. 58. In this case, however, due to the small number 

50 of representative points, the conversion parameters 
cannot be always obtained. In case of, for example, a 
rectangle, an ellipse and a parallelogram, each of them 
has three representative points but eight parameters for 
a projection conversion model cannot be obtained from 

55 the moving vectors of these three representative points. 
FIG. 66 shows the relationship between the types of fig- 
ures used for approximation and conversion models for 
which conversion parameters can be obtained. In FIG. 
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66, symbol O denotes a combination capable of calcu- 
lating parameters and symbol x denotes a combination 
incapable of calculating parameters. 
[0377] In step S305C, the conversion parameters ob- 
tained in step S304C are approximated by a temporal s 
function, which processing is the same as that in step 
S103C shown in FIG. 58. 

[0378] In step S306C, it is determined whether or not 
it is necessary to update the reference object region. In 
this processing, the reference object region is first con- 10 
verted by the conversion parameters and a predicted 
object region in a current frame is calculated. Needless 
to say, it is possible to calculate the same predicted ob- 
ject region by converting only the representative points 
of the reference object region using the conversion pa- w 
rameters and constituting a figure specified by the con- 
verted representative points. Next, the error between 
the predicted object region and the approximate figure 
of the target object region in the current frame is calcu- 
lated and it is determined whether or not the reference 20 
object region needs to be updated by a threshold value. 
[0379] In step S307C, the reference object region is 
actually updated after it Is determined at step S306C 
that the reference object region needs to be updated. 
While setting the processing target frame as a reference 25 
frame, the approximate figure of the object region in the 
frame is stored as a new reference object region and 
the coordinates of the representative points of the ref- 
erence object region are stored, as well. 
[0380] In step S308C, it is determined whether or not 30 
the description of the object region in the video is ended 
as in the case of step S106C shown in FIG. 58. 
[0381] In step S309C, information on the object region 
(function parameters approximating the conversion pa- 
rameters) calculated as in the same manner as that of 35 
step S107C shown in FIG. 58 is recorded in a predeter- 
mined description format. 

[0382] FIG. 67 shows one example of a description 
format forthe object region data. This description format 
is the same as that shown in FIG. 62 except for figure *o 
information 1109C. The figure information 1109C used 
instead of the reference object region information 509C 
shown in FIG. 62 comprises an ID specifying a figure 
type and the coordinates of the representative points of 
the approximate figure of the reference object region. 
Symbol M denotes the number of representative points 
necessary for a figure specified by the ID. 
[0383] Next, variations relating to the data structure 
of object region data will be described hereinafter. 
[0384] In the above-described examples, conversion 50 
parameters are obtained for all frames with respect to a 
certain object region. Frames for which conversion pa- 
rameters are obtained may be sampled. For example, 
one frame out of three frames may be sampled and a 
reference object region in frame 1 as well as reference 55 
object regions in frames, 4, 7, - may be used. 
[0385] If conversion parameters are expressed by a 
temporal function and information for specifying the 



function are described in object region data, then the 
object region data may be approximated by a function 
by the sampled parameter values as in the case of the 
above-described examples. In addition, it is not neces- 
sary to include information on sampling in the object re- 
gion data. 

[0386] Meanwhile, if conversion parametervalues are 
directry described in the object region data, then (1 ) pa- 
rameter values in the frames which are not sampled are 
appropriately interpolated (e.g., the same values as 
those in the frame which has been sampled just before 
the frame which is not sampled are described in the ob- 
ject region data) and the same object region data as that 
in FIG. 62 is prepared, or (2) sampling information 520C 
as shown in FIG. 68 is added to the object region data, 
only the parameter values in the sampled frames and 
information for allowing specifying a sampling method 
(e.g., numeric value "n" as information indicating that n 
frames are sampled once (note, however, that in case 
of n = 1 , for example, it is assumed that all frames are 
sampled)) may be described in the sampling information 
520C in the first embodiment. In the method of (2), if 
using the object region data, parameter values in the 
frames which have not been sampled can be interpolat- 
ed, if necessary, by referring to the sampling information 
520C. 

[0387] Next, description will be given to a method of 
generating object region data by dividing one object into 
a plurality of regions in the above described embodi- 
ments. 

[0388] Conventionally, one conversion parameter is 
obtained for one object. In case of an object which ap- 
parent shape has great change, however, it is some- 
times preferable to divide an object into a plurality of re- 
gions and use conversion parameters forthe respective 
regions. For example, a walking person heavily moves 
his or her hands and legs although less moves his or 
hear head and body. In that case, it is possible to obtain 
conversion parameters forthe respective parts in a sta- 
ble manner by dividing the object into separate regions 
of head/body/hands/legs rather than dealing with the 
person as one object. 

[0389] If one object is expressed by a plurality of fig- 
ures, it is required to execute a processing for dividing 
the object into a plurality of regions. This processing 
may be executed by any method such as a processing 
method of directly inputting figures manually. In that 
case, this processing can be realized by operations in- 
cluding using a pointing device such as a mouse, and 
allowing regions to be surrounded by rectangles or el- 
lipses on an image or designating regions by the trajec- 
tory of the pointing mouse. Further, if input operation is 
carried out not manually but automatically, there is pro- 
posed a method of realizing the processing by, for ex- 
ample, clustering the movement of an object. According 
to this method, the movements of the respective regions 
of the object between continuous frames are calculated 
by a correlation method (see, for example, Gazo Kaiseki 
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Handbook (Image Analysis Handbook), Section II, 
Chapter 3, Tokyo University Publication, 1991) or a gra- 
dient method (see, for example, B. K. P. Horn and B. G. 
Schunck, "Determining optical flow", Artificial Intelli- 
gence, vol. 17, pp. 185-203, 1981), and only the similar 5 
movements among them are gathered to thereby form 
regions. 

[0390] FIG. 69 shows a state in which regions having 
a similar optical flow are gathered together and an object 
is thereby divided into a plurality of regions. io 
[0391] FIG. 70 shows one example of data structure 
for describing an object in a plurality of regions. The ex- 
ample of FIG. 70 is to expand the data structure (FIG. 
67) for describing the object in a single region and data 
following region ID data 2906C are the same as those 15 
in FIG. 67. The number of divided regions is stored in 
2902C and data on the respective divided regions are 
held in 2905C and the following. 
[0392] Though the seventh embodiment adds the 
depth information to the object region of an arbitrary 20 
frame which is described using a reference object region 
data in a reference frame and a conversion parameter 
indicating the conversion from the reference object re- 
gion to the object region in the arbitrary frame, it is pos- 
sible to add the display flag, passing range information, 25 
and panorama conversion parameters for mosaicking 
to the above described object region data. 
[0393] While in each of the above embodiments, in- 
formation that determines the approximate figure is 
used as the representative points of a figure approxi- so 
mating the object region, a plurality of characteristic 
points extracted from the object region in the image may 
be used as the representative points of the figure. Var- 
ious things can be considered as characteristic points. 
For instance, the angles of an object (for example, as 
described in L. Kitchen and A. Rosenfeld, "Gray-level 
corner detection," Pattern Recognition Letters, No. 1 , 
1982, pp. 95-1 02) and the center of gravity of an object 
may be considered. In this method, there is not enough 
information to determine an approximate figure. Conse- 
quently, it is impossible to determine an approximate fig- 
ure itself from the object region data, but the processing 
at the upper-layer processing unit becomes simpler. ~ 
[0394] The data format of the object region data is 
similar to that in the case of the representative points. 
Only the approximate figure data is changed to charac- 
teristic point data, the number of the representative 
points to the number of the characteristic points, and the 
representative point trajectory to the characteristic point 
trajectory. The number of the approximate figures and 
the figure type IS are omitted. 
[0395] Tfie^etrrodsand-apparatusiof^ 
ventionjyppjy^o ^computer-readable recording medium 
in which a program is recorded.th at causes a computers 
to execute a procedure equivalent to the present inven- ■ 
tion;(or to function as means equivalent to the present 
invention or to realize a function equivalent to the 
present invention). 



[0396] According to embodiments of the present in- 
vention, the region of the target object in the videb is 
described as the parameters of the function that approx- 
imates the trajectory obtained by arranging, in the direc- 
tion of frame advance, the quantity indicating the posi- 
tion of the representative points of an approximate figure 
for the object region. This makes it possible to describe 
the region of the desired object in the image using a 
smaller amount of data and facilitates the creation and 
handling of the data. 

[0397] Furthermore, According to embodiments of the 
present invention, it is possible to search for an object 
in the image efficiently and effectively. 

Claims 

1 . A method of describing object region data about an 
object in video data over a plurality of frames, said 
method characterized by comprising: 

approximating (S2) the object using a figure for 
each of said frames; 

extracting (S3) a plurality of points representing 
the figure for each of said frames; 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, position data 
about one of said plurality of points and relative 
position data about remaining points with refer- 
ence to said one of said plurality of points; and 
describing the object region data using the 
functions. 

2. The method according to claim 1 , characterized in 
that said relative position data are components of 
differential vectors between the one of said plurality 
of points and remaining points. 

40 3. A method of describing object region data about an 
object in video data over a plurality of frames, said 
method characterized by comprising: 

approximating (S2) the object using a figure for 
45 each of said frames; 

extracting (S3) a plurality of points representing 
the figure for each of said frames; 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
50 the frames advancing direction, position data 

about said plurality of points in a reference 
frame and relative position data about said plu- 
rality of points in a succeeding frame with ref- 
erence to the position data about said plurality 
55 of points in the reference frame; and 

describing the object region data using the 
functions. 



30 

1 



9/13/2006, EAST Version: 2.0.3.0 



59 



EP 1 154 379 A2 



60 



4. The method according to claim 3, characterized in 
that said relative position data are components of 
differential vectors between said plurality of points 
in the reference frame and said plurality of points in 
the succeeding frame. 

5. The method according to any one of claims 1 to 4, 
characterized in that said object region data com- 
prises parameters of the functions. 

6. A method of describing object region data about an 
object in video data over a plurality of frames, said 
method characterized by comprising: 

approximating (S2) the object using a figure for 
each of said frames; 

extracting (S3) a plurality of points representing 
the figure for each of said frames; 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 
positions of said plurality of points; and 
describing the object region data using the 
functions and depth information of the object. 

7. The method according to claim 6, characterized in 
that said object region data is described by using 
the depth information of the object and parameters 
of the functions. 

8. The method according to claim 6 or 7, character- 
ized in that said depth information is a relative 
depth and has a discrete level value. 

9. A method of describing object region data about an 
object in video data over a plurality of frames, said 
method characterized by comprising: 

approximating (S2) the object using a figure for 
each of said frames; 

extracting (S3) a plurality of points representing 
the figure for each of said frames; 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 
positions of said plurality of points; and 
describing the object region data using the 
functions and display flag information indicat- 
ing a range of frames in which the object or 
each of said points is visible or not. 

10. The method according to claim 9, characterized in 
that said object region data is described by using 
the display flag information and parameters of the 
functions. 

1 1 . A method of describing object region data about an 
object in video data over a plurality of frames, said 



method characterized by comprising: 

approximating (S2) the object using a figure for 
each of said frames; 

5 extracting (S3) a plurality of points representing 

the figure for each of said frames; 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 

10 positions of said plurality of points; and 

describing the object region data using the 
functions and object passing range information 
indicating a range where the figure approximat- 
ing the object exist over said plurality of frames. 

15 

12. The method according to claim 11 , characterized 
in that said object region data is described by using 
the object passing range information and parame- 
ters of the functions. 

20 

13. A method of describing object region data about an 
object moving in a panorama image formed by com- 
bining a plurality of frame with being overlapped, 
. said method characterized by comprising: 

25 

approximating (S2) the object in the panorama 
image using a frame; 

extracting (S3) a plurality of points representing 
the figure in a coordinate system of the pano- 
se ram a image; 

approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 
positions of said plurality of points; and 
35 describing the object region data using the 

functions. 

14. The method according to claim 13, characterized 
in that said object region data comprises parame- 

40 tens of the functions. 

15. The method according to anyone of claims 1 , 3, 6, 
9, 11 and 13 characterized in that said object re- 
gion data comprises information representing a 

45 range of frames in which the object exists in the vid- 
eo data and information identifying the figure ap- 
proximating the object region. 

16. The method according to any one of claims 1 , 3, 6, 
so 9, 11 and 13, characterized In that said object re- 
gion data comprises one of information represent- 
ing related information linking to the object and in- 
formation representing a method of accessing the 
related information. 

55 

17. An article of manufacture comprising a computer 
usable medium having computer readable program 
code means embodied therein 
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and for describing object region data about an ob- 
ject in video data overa plurality of frames, the com- 
puter readable program code means character- 
ized by comprising: 

computer readable program code means for 
approximating (S2) the object using a figure for 
each of said frames; 

computer readable program code means for 
extracting (S3) a plurality of points representing 
the figure for each of said frames; 
computer readable program code means for 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, position data 
about one of said plurality of points and relative 
position data about remaining points with refer- 
ence to said one of said plurality of points; and 
computer readable program code means for 
describing the object region data using the 
functions. 

18. An article of manufacture comprising a computer 
usable medium having computer readable program 
code means embodied therein and for describing 
object region data about an object in video data over 
a plurality of frames, the computer readable pro- 
gram code means characterized by comprising: 

computer readable program code means for 
approximating (S2) the object using a figure for 
each of said frames; 

computer readable program code means for 
extracting (S3) a plurality of points representing 
the figure for each of said frames; 
computer readable program code means for 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, position data 
about said plurality of points in a reference 
frame and relative position data about said plu- 
rality of points in a succeeding frame with ref- 
erence to the position data about said plurality 
of points in the reference frame; and 
computer readable program code means for 
describing the object region data using the 
functions. 

19. An article of manufacture comprising a computer 
usable medium having computer readable program 
code means embodied therein and for describing 
object region data about an object in video data over 
a plurality of frames, the computer readable pro- 
gram code means characterized by comprising: 

computer readable program code means for 
approximating (S2) the object using a figure for 
each of said frames; 
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computer readable program code means for 
extracting (S3) a plurality of points representing 
the figure for each of said frames; 
computer readable program code means for 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 
positions of said plurality of points; and 
computer readable program code means for 
describing the object region data using the 
functions and depth information of the object. 

20. An article of manufacture comprising a computer 
usable medium having computer readable program 
code means embodied therein and for describing 
object region data about an object in video data over 
a plurality of frames, the computer readable pro- 
gram code means characterized by comprising: 

computer readable program code means for 
approximating (S2) the object using a figure for 
each of said frames; 

computer readable program code means for 
extracting (S3) a plurality of points representing 
the figure for each of said frames; 
computer readable program code means for 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 
positions of said plurality of points; and 
computer readable program code means for 
describing the object region data using the 
functions and display flag information indicat- 
ing a range of frames in which the object or 
each of said points is visible or not. 

21 . An article of manufacture comprising a computer 
usable medium having computer readable program 
code means embodied therein and for describing 
object region data about an object in video data over 
a plurality of frames, the computer readable pro- 
gram code means characterized by comprising: 

computer readable program code means for 
approximating (S2) the object using a figure for 
each of said frames; 

computer readable program code means for 
extracting (S3) a plurality of points representing 
the figure for each of said frames; 
computer readable program code means for 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 
positions of said plurality of points; and 
computer readable program code means for 
describing the object region data using the 
functions and object passing range information 
indicating a range where the figure approximat- 
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ing the object exist over said plurality of frames. 

22. An article of manufacture comprising 

a computer usable medium having computer read- 
able program code means embodied therein and for 5 
describing object region data about an object mov- 
ing in a panorama image formed by combining a 
plurality of frames with being overlapped, the com- 
puter readable program code means character- 
ized by comprising: 

computer readable program code means for 
approximating (S2) the object in the panorama 
image using a figure; 

computer readable program code means for 
extracting (S3) a plurality of points representing 
the figure in a coordinate system of the pano- 
rama image; 

computer readable program code means for 
approximating (S4) trajectories with functions, 
the trajectories being obtained by arranging, in 
the frames advancing direction, data indicating 
positions of said plurality of points; and 
computer readable program code means for 
describing the object region data using the 
functions. 

23. A computer data signal embodied in a carrier wave, 
the computer data signal capable of describing ob- 
ject region data about an object in video data over 
a plurality of frames , the computer data signal char- 
acterized by comprising: 

program code portion for causing a computer 
to approximate the object using a figure for 
each of said frames; 

program code portion for causing a computer 
to extract a plurality of points representing the 
figure for each of said frames; 
program code portion for causing a computer 
to approximate trajectories with functions, the 
trajectories being obtained by arranging, in the 
frames advancing direction, position data about 
one of said plurality of points and relative posi- 
tion data about remaining points with reference 
to said one of said plurality of points; and 
program code portion for causing a computer 
to describe the object region data using the 
functions. 

♦ 

24. A computer data signal embodied in a carrier wave, 
the computer data signal capable of describing ob- 
ject region data about an object in video data over 
a plurality of frames, the computer data signal char- 
acterized by comprising: 

program code portion for causing a computer 
to approximate the object using a figure for 



each of said frames; 

program code portion for causing a computer 
to extract a plurality of points representing the 
figure for each of said frames; 
program code portion for causing a computer 
to approximate trajectories with functions, the 
trajectories being obtained by arranging, in the 
frames advancing direction, position data about 
said plurality of points in a reference frame and 
relative position data about said plurality of 
points in a succeeding frame with reference to 
the position data about said plurality of points 
in the reference frame; and 
program code portion for causing a computer 
to describe the object region data using the 
functions. 

25. A computer data signal embodied in a carrier wave, 
the computer data signal capable of describing ob- 
ject region data about an object in video data over 
a plurality of frames, the computer data signal char- 
acterized by comprising: 

program code portion for causing a computer 
to approximate the object using a figure for 
each of said frames; 

program code portion for causing a computer 
to extract a plurality of points representing the 
figure for each of said frames; 
program code portion for causing a computer 
to approximate trajectories with functions, the 
trajectories being obtained by arranging, in the 
frames advancing direction, data indicating po- 
sitions of said plurality of points; and 
program code portion for causing a computer 
to describe the object region data using the 
functions and depth information of the object. 

26. A computer data signal embodied in a carrier wave, 
the computer data signal capable of describing ob- 
ject region data about an object in video data over 
a plurality of frames, the computer data signal char- 
acterized by comprising: 

program code portion for causing a computer 
to approximate the object using a figure for 
each of said frames; 

program code portion for causing a computer 
to extract a plurality of points representing the 
figure for each of said frames; 
program code portion for causing a computer 
to approximate trajectories with functions, the 
trajectories being obtained by arranging, in the 
frames advancing direction, data indicating po- 
sitions of said plurality of points; and 
program code portion for causing a computer 
to describe the object region data using the 
functions and display flag information indicat- 
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ing a range of frames in which the object or 
each of said points is visible or not. 

27. A computer data signal embodied in a carrier wave, 
the computer data signal capable of describing ob- 5 
ject region data about an object in video data over 

a plurality of frames, the computer data signal char- 
acterized by comprising: 

program code portion for causing a computer 10 
to approximate the object using a figure for 
each of said frames; 

program code portion for causing a computer 
to extract a plurality of points representing the 
figure for each of said frames; is 
program code portion for causing a computer 
to approximate trajectories with functions, the 
trajectories being obtained by arranging, in the 
frames advancing direction, data indicating po- 
sitions of said plurality of points; and 20 
program code portion for causing a computer 
to describe the object region data using the 
functions and object passing range information 
indicating a range where the figure approximat- 
ing the object exist over said plurality of frames. 25 

28. A computer data signal embodied in a carrier wave, 
the computer data signal capable of describing ob- 
ject region data about an object moving in a pano- 
rama image formed by combining a plurality of 30 
frames with being overlapped, the computer data 
signal characterized by comprising: 

program code portion for causing a computer 
to approximate the object in the panorama im- 35 
age using a figure; 

program code portion for causing a computer 
to extract a plurality of points representing the 
figure in a coordinate system of the panorama 
image; <o 
program code portion for causing a computer 
to approximate trajectories with functions, the 
trajectories being obtained by arranging, in the 
frames advancing direction, data indicating po- 
sitions of said plurality of points; and 
program code portion for causing a computer 
to describe the object region data using the 
functions. 

29. A carrier medium carrying computer readable in- 50 
structions for controlling the computer to carry out 

the method of any one of claims 1 to 16. 
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